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^ ; Abstract 

String transductions that are definable in monadic second-order 

(mso) logic (without the use of parameters) are exactly those realized 

by deterministic two-way finite state transducers. Nondeterministic 

f^ I mso definable string transductions (i.e., those definable with the use 

^O ' of parameters) correspond to compositions of two nondeterministic 

two-way finite state transducers that have the finite visit property. 
^^ [ Both families of mso definable string transductions are characterized 

^> I in terms of Hennie machines, i.e., two-way finite state transducers with 

c/3 . the finite visit property that are allowed to rewrite their input tape. 



> 



[x^ ■ Introduction 



s 



In language theory, it is always a pleasant surprise when two formalisms, in- 
troduced with different motivations, turn out to be equally powerful, as this 
indicates that the underlying concept is a natural one. Additionally, this 
means that notions and tools from one formalism can be made use of within 
the other, leading to a better understanding of the formalisms under con- 
sideration. Most famous in this respect are of course the regular languages 



Yu97|| , that can be defined using a computational formalism (finite state au- 
tomata, either deterministic or nondeterministic), but also have well-known 
grammatical (right-linear grammars), operational (rational operations), al- 
gebraic (congruences of finite index), and logical (monadic second-order logic 



of one successor) characterizations | M(JPi43 , RaSc59 , UlioSfi , Kle5(j , Myii57 
Ni?58| , puc60| , Pg6T| 



In this paper we study 'regular' (string-to-string) transductions, rather 
than regular languages, and we obtain the equivalence of particular compu- 
tational and logical formalisms, modestly following in the footsteps of Biichi 
and Elgot. Their original work | Biic60| , |Elg61 ], demonstrating how a logical 
formula may effectively be transformed into a finite state automaton accept- 
ing the language specified by the formula when interpreted over finite se- 
quences, shows how to relate the specification of a system behaviour (as given 
by the formula) to a possible implementation (as the finite state behaviour 
of an automaton). In recent years much effort has been put into transform- 
ing these initial theoretical results into software tools for the verification of 
finite state systems, model checking, see the monograph ||Kur94|| . Generaliza- 
tions of the result of Biichi and Elgot include infinite strings | ]Biic62| | , trees 
|Pon70| , [ThWrGSj , traces (a syntactic model for concurrency) |pbi95|| , texts 
(strings with an additional ordering) | HoPa97|] , and tree-to-tree transduc- 



tions ||BlEn97| , [EnMa98| . We refer to |[rho97|| for an overview of the study of 



formal languages within the framework of mathematical logic. 

We give a short description of the two formalisms of 'regular' string trans- 
ductions that we study in this paper. We mainly consider the deterministic 
case. 

A two-way finite state transducer (or two-way generalized sequential ma- 
chine, 2gsm) is a finite state automaton equipped with a two-way input 
tape, and a one-way output tape. Such a transducer may freely move over 
its input tape, and may typically reverse or copy parts of its input string. 
It is, e.g., straightforward to construct a transducer realizing the relation 
{{w,ww) I w e {a,b}*}. It should be clear from this example that regu- 
lar languages are not closed under 2gsm mappings, contrary to their closure 
under one-way gsm mappings. 

However, it is well known ||RaSc59| , |She59| , |HoU179|| that two-way finite 
state automata accept only regular languages, and consequently (using a 
straightforward direct product construction) the regular languages are closed 
under inverse 2gsm transductions. From this general result we may infer a 
large number of specific closure properties of the regular languages, such as 
closure under the 'root' operation ^K = {w \ ww G K}. It is maybe less well 
known that the (deterministic) 2gsm mappings are closed under composition 

Ja771|. This result is used as a powerful tool in this paper. 



The monadic second-order (mso) logic of one successor is a logical frame- 
work that allows one to specify string properties using quantification over 
sets of positions in the string. As stated above, Biichi and Elgot proved that 
the string languages specified by mso definable properties are exactly the 
regular languages. The logic has a natural generalization to graphs, with 
quantification over sets of nodes, and predicates referring to node labels and 
edge labels. It is used to define graph-to-graph transductions, by specifying 
the edges of the output graph in terms of properties of (copies of) a given 
input graph [|Cou97| , |Eng97|| . This is just a special case of the notion of 



interpretation of logical structures, well known in mathematical logic (see, 
e.g., |^ee92| . Section 6]). These mso definable graph transductions play an 



important role in the theory of graph rewriting, as the two main families 
of context-free graph languages can be obtained by applying mso definable 
graph transductions to regular tree languages | pi)nOo97| , |CoEn95 . 



Here we consider mso definable string transductions, i.e., the restriction 
of mso definable graph transductions to linear input and output graphs. It 
is known that mso definable (string) transductions are closed under compo- 
sition, and that the regular languages are closed under inverse mso definable 
transductions (recall that regular is equivalent to mso definable), see, e.g., 
| Cou94|| . 



Apart from these similar closure properties there is more evidence in the 
literature that indicates the close connection between 2gsm transductions and 
mso definable transductions. First, various specific 2gsm transductions were 
shown to be mso definable, such as one-way gsm mappings, mirror image, 
and mapping the string w onto w" (for fixed n), cf. [Pou97| , Prop 5.5.3]. 
Second, returning to the theory of graph grammars, it is explained in ||Eng97| , 
pages 192-8] that the ranges (i.e., output languages) of mso definable (string) 
transductions are equal to the (string) languages defined by linear context- 
free graph grammars, which, by a result of ||EnHe9l| , equal the ranges of 2gsm 



transductions. Consequently, the two families of transductions we consider 
have the same generative power (on regular input). This, however, does not 
answer the question whether they are the same family of transductions (cf. 
Section 6 of ||Cou94|| ) . In this paper we answer this question positively (in 



the deterministic case). Thus, string transductions that are specified in mso 
logic can be implemented on 2gsm's, and vice versa. 

Our paper is organized as follows. 

In a preliminary section we mainly recall notions and notations regarding 



graphs, in particular mso logic for graphs and strings. Moreover, we recall 
the usual, natural representation of strings as linear graphs that allows a 
transparent interpretation of strings and string languages within the setting 
of the mso logic for graphs. 

In Section ^ we study two-way machines, our incarnation of two-way 
generalized sequential machines. We extend the basic model by allowing the 
machines to 'jump' to new positions on the tape (not necessarily adjacent 
to the present position) as specified by an mso formula that is part of the 
instructions. This 'hybrid' model (in between logic and machine) facilitates 
the proof of our main result. We consider yet another variant of the 2gsm 
which allows 'regular look-around', i.e., the ability to test the strings to 
the left and to the right of the reading head for membership in a regular 
language. The equivalence of the basic 2gsm model and our two extended 
models (in the deterministic case) is demonstrated using the closure of 2gsm 
under composition and using Biichi and Elgot's result for regular languages. 

In Section |^ we recall the definition of mso definable graph transduction, 
and restrict that general notion to mso definable string transductions by con- 
sidering graph representations for strings. In addition to the representation 
of Section |I], we use an alternative, natural and well-known, graph represen- 
tation for strings. Again it uses linear graphs, with labels on the edges rather 
than on the nodes to represent the symbols of the string. These two repre- 
sentations differ slightly, due to an unfortunate minor technicality involving 
the empty string; the second representation gives more uniform results. 



The main result of the paper is presented as Theorem ^: the equivalence 
of the (deterministic) 2gsm from Section H, and the mso definable string 
transductions from Section |^. Section ^ contains the proof of this result. 
In order to transform a 2gsm into the mso formalism we consider the 'com- 
putation space' of a 2gsm on a given input. This is the graph which has 
a node for each pair consisting of a tape position and a state of the 2gsm. 
These nodes are connected by edges representing the possible moves of the 
2gsm. The transduction is then decomposed into (basically) two construc- 
tions, each of which is shown to be mso definable. First the computation 
space is defined in terms of the input string, then the computation path for 
the input (and its resulting output string) is recovered from the computation 
graph. One implication of the main result then follows by the closure of mso 
definable (graph!) transductions under composition. The reverse implication 
is obtained by transforming an mso definable string transduction into a 2gsm 
equipped with mso instructions, the tool we introduced in Section 0. 



In Section ^ we study nondeterminism. This feature can be added to 
mso definable transductions by introducing so-called 'parameters': free set 
variables in the definition of the transduction ||Cou97|| . The output of the 



transduction for a given input may then vary for different valuations of these 
parameters. These transductions are closed under composition, as opposed 
to those realized by nondeterministic 2gsm. We conclude that as opposed to 
the deterministic case, the two nondeterministic families are incomparable. 
Finally, we observe that the family of nondeterministic mso transductions is 
equal to the family of transductions defined by composing a (nondetermin- 
istic) relabelling and a deterministic transduction. 

Finite visit machines form the topic of our final section. Section |^. These 
machines have a fixed bound on the number of times each of the positions 
of their input tape may be visited during a computation. We characterize 
the nondeterministic mso definable string transductions as compositions of 
two nondeterministic 2gsm's with the finite visit property. Additionally we 
demonstrate that an arbitrary composition of nondeterministic 2gsm's real- 
izes a nondeterministic mso definable string transduction if and only if that 
transduction is finitary, i.e., it has a finite number of images for every input 
string. 

A more direct characterization can be obtained by considering Hennie 
transducers, i.e., finite visit 2gsm's that are allowed to rewrite the symbols 
on their input tape. These machines characterize the mso definable trans- 
ductions, both in the deterministic case ||(JhJa77[ and the nondeterministic 
case. 



An extended abstract of this paper is published as |[EnHo99 



1 Preliminaries 

We recall some notions and results regarding graphs and their monadic sec- 
ond order logic. 

By \w\ we denote the length of the string w. 

We use o to denote the composition of binary relations (note the order): 
Ri ° R2 = {{uii,ui3) I there exists W2 such that (^1,^2) G -^1,(^2,^^3) G 
R2}, and extend it to families of binary relations: Fi o F2 = {Ri ° R2 \ Ri E 

Fi, R2 e F2}. 

A binary relation R is functional, if {w, zi) G R and (w, Z2) G R imply 
Zi = Z2- It is finitary, if each original is mapped to only finitely many images, 
i.e., the set {z \ [w, z) G R} is finite for each w in the domain of R. 

Graphs. Let S and F be alphabets of node labels and edge labels, respec- 
tively. A graph over E and F is a triple g = (V, E, i), where V is the finite set 
of nodes, E CV xT xV the set of edges, and i : V —^ T, the node labelling. 
The set of all graphs over E and F is denoted by GR(S, F). We allow graphs 
that have both labelled and unlabelled nodes and edges by introducing a des- 
ignated symbol * to represent an 'unlabel' in our specifications, but we omit 
this symbol from our drawings. We write GR(*,F) and GR(S, *) to distin- 
guish the cases when all nodes are unlabelled, and all edges are unlabelled, 
respectively. 

Logic for graphs. For alphabets S and F, the monadic second-order logic 
MSO(S, F) expresses properties of graphs over S and F. The logical language 
uses both node variables x,y, . . . and node-set variables X,Y, . . .. 

There are four types of atomic formulas: labo-(a;), meaning node x has 
label a (with a G S); edge^(x, y), meaning there is an edge from x to y with 
label 7 (with 7 G F); x = y, meaning nodes x and y are equal; and x G X, 
meaning x is an element of X. 

As usual, formulas are built from atomic formulas with the propositional 
connectives -1, A, V, — *>, using the quantifiers V and 3 both for node variables 
and node-set variables. 

A useful example [p?hWr68|] of such a formula is the binary predicate :< 



claiming the existence of a (directed) path from x to y: 

x^y = {\fX)[{x eX A closed(X)) ^ y e X] 



where closed(X) = {\/zi){\fz2){zi E X A edge(zi, ^2) — > -22 G X), and 
edge(2;i,Z2) = V7er6dge^(zi, ^2). We also use x ^ y, where one addition- 
ally requires that x ^ y; for acyclic graphs this expresses the existence of a 
nonempty path from x to y. 

Let v? be a formula of MSO(S,r) with set H of free variables (of either 
type), and let g = {V, E, i) be a graph in GR(S, F). Let z/ be a valuation of 
(f, i.e., a mapping that assigns to each node variable a; G S an element z/(a;) 
of V, and to each set variable X G H a subset iy{X) of V. We write g,!/ \= ip 
if (f is satisfied in the graph g, where the free variables of ip are valuated 
according to u. 

Let v^(xi, . . . , Xm, Xi, . . . , Xn) be an MSO(S, F) formula with free node 
variables Xi and free node set variables Xj, and let ui, . . . ,Um be nodes of 
graph g, and f/i, . . . , f/„ sets of nodes of g. We write g \= (p{ui, . . . , Um, Ui, . . . , 
Un) whenever gjiy \= y^{xi, . . . , x^, Xi, . . . , X„), where u is the valuation with 
u{xi) = Ui, z/(Xj) = Uj. 

Let S be a finite set of variables. The set {0, 1}^ of 0, 1-assignments to 
elements of S is finite, and may be considered as an alphabet. A S-valuated 
graph over S and F is a graph in GR(S x {0, 1}^,F), such that for every 
node variable a; in S there is a unique node of the graph of which the label 
(a, /) G S X {0, 1}^ satisfies f{x) = 1. 

Clearly, such a S- valuated graph g determines a graph g\Ti in GR(S,F), 
by dropping the {0, 1}^ component of its node labels, as well as a valuation 
Ug of the variables in S, by taking 

- for a node variable a; G S, lygix) = u, where u is the unique node having 
a label (a, /) with f{x) = 1, 

- for a node-set variable X G H, t^giX) = U, where U consists of all 
nodes v having a label (a, /) with /(X) = L 

For a formula (/? of MSO(S, F) with free variables in S, and a H- valuated 
graph g we write g \= ip ii ip is true for the underlying graph under the 
implicitly defined valuation, i.e., if (^jS, z/^ |= (/?; (/? defines the graph language 
GL{ip) = {g E GR(S x {0,1}^, F) | g \= ip}. A graph language is mso 
definable if there exists a closed mso formula that defines the language. 

String representation. A string w G S* of length k can be represented 
by the graph nd-gr(w) in GR(S,*), consisting of k nodes labelled by the 



consecutive symbols of w, with k — 1 (unlabelled) edges representing the 
successor relation for the positions of the string. In the figure below, we 
show nd-gT{ababb). Note that for the empty string A, nd-gr(A) is the empty 
graph. With this representation, a formula ip of MSO(S, *) defines the string 
language L{{p) = {u; G (S x {0, 1}")* | nd-gr(w) |= ip}, where S is the set of 
free variables of ip; note that nd-gr(w) is a S-valuated graph over S and *. 



Given the close connection between the positions and their successor re- 
lation in a string w on the one hand, and the nodes and their connecting 
edges in nd-gr(w) on the other, we say that a string w satisfies a formula cp 
if nd-gr(w) |= ip. 

String languages definable by monadic second-order formulas are exactly 
the regular languages, as shown by Biichi and Elgot. 



1 Proposition ( [[Biic6q , |E1k61|] ) 



1. L{(p) is a regular string language for every formula ip o/MSO(S, *). 

2. A string language K C J]* is regular iff there is a closed formula (f of 
MSO(S, *) such that K = L{ip). 

We will also refer to Proposition]^ as 'Biichi's result', with due apologies 
to Elgot. 

Observe that the set of all strings over a fixed alphabet S forms an mso 
definable graph language via the above representation. The defining formula 
for the set {nd-gr(w) | w G S*} over MSO(E,*) expresses the existence of 
an initial and a final node (provided the graph is nonempty) and demands 
that every node has at most one direct successor (i.e., the edge relation is 
functional); 'guards' (3x)true -^ are added in order to make the empty string 
A satisfy the formula. 

(3x)true -^ (3x)(Wy){x :< y A -i(y -< x)) 
A (3a;)true -^ {3x){yy){y :< x A -i(x -< y)) 
A (Vx)(V|/i)(V|/2)((edge(x,|/i) A edge(x,|/2)) -* Vi = 2/2) 

As a consequence, the set of graphs representing a string language K, 
{nd-gr(w) I w G K} is an mso definable graph language for every regular 
language K. 
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2 Two- Way Machines 

We present our (slightly nonstandard) model of two-way generalized sequen- 
tial machines (2gsm), or two-way finite state transducers. In order to facil- 
itate the proof of the equivalence of two-way finite state transductions and 
logically definable transductions we extend the basic model to a machine 
model that has its input tests as well as moves specified by mso formulas. 
We prove the equivalence of this extended model to the basic model. An 
important tool in this proof is the observation that a two-way automaton is 
able to keep track of the state of another (one-way) finite state automaton 
(proved in Lemma 3 of | |HoU167| |, see also p. 212 of | |AH U 69| ] ) . We formalize 



this fact by extending the 2gsm with the feature of 'regular look-around'. 
The equivalence of this model with the basic model is then proved using 
the related result of ||ChJa77|| stating that deterministic two-way finite state 
transductions are closed under composition. The equivalence of the regular 
look-around model with the mso formula model is proved using Biichi's result 
(Proposition |l|). 

Since we need several types of two-way machines, we first introduce a 
generic model, and then instantiate it in several ways. 

A two-way machine (2m) is a finite state device equipped with a two- 
way input tape (read only), and a one-way output tape. In each step of a 
computation the machine reads an input symbol, changes its internal state, 
outputs a string, and moves its input head, all depending on the symbol read 
and the original internal state. 

We specify a 2m as a construct A^ = {Q, Si, S2, S, qm, q/), where Q is the 
finite set of states. Si and S2 are the input alphabet and output alphabet, qin 
and qf are the initial and the final state, and 5 is a finite set of instructions. 
Each instruction is of the form (p, t, gi, ai, /ii, qo, ao, fio), where p G Q — {qf} 
is the present state of the machine, t is a test to be performed on the input, 
and the triples (gj,aj,/ij), z = 1,0, fix the action of the machine depending 
on the outcome of the test t: qi E Q is the new state, a^ e Sj is the string 
written on the output tape, and /Xj describes the (deterministic) move of the 
reading head on the input tape. The precise form of these instructions varies 
from one model to another, in particular the form of the test t, and the moves 

The above instruction can be expressed as the following informal code. 



label p: if i then write ai ; move /^i ; goto qi 
else write uq ; move /^o ; goto qq 
fi 

The string on the input tape is marked by two special symbols, h and 
H, indicating the boundaries of the tape. So, when processing the string 
0"! ■ • ■ o"„, ai G Si, the tape has n + 2 reachable positions 0, 1, . . . , n, n + 
1, containing the string hcxi ■ ■ -anH. The reading head is on one of these 
positions. 

The 2m M. realizes the transduction ?7i C S^ x S2, such that {w^z) & m 
whenever there exists a computation with \-w-\ on the input tape, starting 
in initial state gj„ with the input head on position (where the symbol h is 
stored), and ending in the accepting state qf, while z has been written on 
the output tape. 

A 2m is deterministic if for each state p there is at most one instruction 
{p,t, qi,ai,fii, qo,ao,f^o) that starts in p. Note that the transduction m 
realized by a deterministic 2m AI is a partial function m : J^l —>■ H^ because 
the /ij in the instructions describe deterministic moves of the reading head. 

We consider the usual two-way generalized sequential machine (2gsm), 
introduced in ||AhU170|| , and two new instantiations of the generic 2m model, 



the 2gsm with regular look- around, and the 2gsm with mso-instructions. 

2gsm. For the basic 2gsm model each instruction {p, t, qi, ai, /Xi, qo, Oq, /io) 
in 6 satisfies t G Si U {h, H}, and /ij G {—1, 0, +1}, i = 1, 0. 

Executing an instruction (p, a, gi,ai,ei, qo,ao,eo) G 6 the 2gsm, assum- 
ing it is in internal state p, when reading a on its input tape, changes its 
state to gi, writes ai to its output tape, and moves its head from the present 
position i to the position i + ei (provided < i -|- ei < n -|- 1); if a is not read 
on the input tape it acts similarly according to the triple (go,«o,eo)- Recall 
that there are no instructions starting in the final state. 

It is more customary to formalize the instructions of a 2gsm as 5-tuples 
{p,a,q,a,e), not having the 'else-part' of our instructions. These two ap- 
proaches are easily seen to be equivalent. Obviously, the 5-tuple can be ex- 
tended to an 8-tuple by adding a dummy 'else-part', as in (p, a, q, a, e, p, A, 0). 
Conversely, one of our instructions (p, a, gi, ai, ei, qo, ao, eo) can be replaced 
by the 'if-part' (p, a, qi, ai, ei) and all alternatives (p, a', go, ao, ^0), o"' 7^ o"- 

For determinism we require each state to have at most one instruction, 
whereas the customary notion considers both state and input symbol. This, 
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somewhat unusual, formulation allows us to have the above common defi- 
nition of determinism for all necessary instantiations of our generic model, 
without having to worry about the mutual exclusiveness of the tests t. This 
is the reason for choosing our 8-tuple formalism. 

The first of the two translations (from 5-tuple model to our 8-tuple model) 
does not respect determinism. We can solve this by checking all alternatives 
in a given state consecutively, as follows. Let (p, ai, qi, Oj, ej), i = 1, . . . , fc be 
all the instructions for state p in a deterministic (5-tuple) 2gsm, which means 
that the ai are different. Introduce k + 1 copies p = p^^\p^'^\ . . . ,p('=),p('^+i) 
of p. Then, the instructions {p^^\ ai, qi, ai, e^, p(*+^), X,0), i = 1, . . . ,k, offer 
the same alternatives, but sequentially rather than in parallel. 

2 Example. Consider the string transduction 
{ {aHa'^b---aHa'"+\aH''a%'^---aH'-a'"+') \ n > 0,ii, . . . ,in+i > }. 

An obvious deterministic 2gsm reads each segment of a's from left to right 
while copying it to the output. When encountering a 6 it rereads the segment 
from right to left. This second pass it writes 6's to the output tape. 

This machine can be implemented by taking Si = S2 = {a,b}, Q = 
{0, 1, 2, 3, 4, 5}, qin = 0, g/ = 5, and 6 consisting of the instructions 

(0,h, 1,A,+1, 0,A,0) 
(l,a, l,a,+l, 2,A,0) 
(2,6, 3,A,-1, 5,A,0) 
(3, a, 3,6,-1, 4,A,+1) 
(4, a, 4,A,+1, 1,A,+1) 

Note that the last three elements of the first instruction are irrelevant. 

The computation of the 2gsm on input aaabbaba can be visualized as in 
Figure |1|, where we have labelled the edges of the computation by the strings 
that are written to the output (with A omitted, for convenience). □ 



Look-around. A 2gs'm with regular look-around (2gsm-rla) extends the 
basic 2gsm model, by allowing more complicated tests. In an instruction 
{p,t, qi,ai,ei, qo,ao,eo) G 6 all components are as before for the 2gsm, 
except the test t, which does not consist of a single letter a, but of a triple 
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Figure 1: Computation for {a^b'^aba, a%^aba) of 2gsni from Example 



t = {R£,a,Rr), where a G (Si U {h, H}), and R£,Rr are regular languages 
such that Ri, Rr C (Si U {h, H})*. This test t is satisfied if a is the symbol 
under the reading head, and the strings to the left and the right of the head 
belong to Re and Rr respectively. 

Obviously, it suffices to have tests {Rg, a, Rr) such that Rg-a- Rr C hSiH. 
For a given 2gsm-rla, an equivalent 2gsm-rla with that property is obtained by 
changing each test {Rg, a, Rr) into (i?^, a, R'^) where R'^ = RgH hS* (with the 
exception that R[ = {A} when a = h), and similarly for R'^. We observe here 
that this notion of 'regular look-around' generalizes the well-known notion 
of regular look-ahead for one-way automata (see, e.g., ||Nii82| , |Eng77|| ) . 



Mso instructions. For a 2gs'm with mso-instructions (2gsm-mso) the test 
and the moves of each instruction are given by mso formulas. To be pre- 
cise, for (p, t, gi,ai,/xi, go^c^o^/^o) G 5, t is given as a formula ^{x) in 
MSO(Si U {h, H},*) with one free node variable x, and the moves /ij are 
given by functional formulas ipi{x,y) in MSO(Si U {l~, H},*) with two free 
node variables x and y (see below for the meaning of 'functional'). 

A test t = ip{x) is evaluated for the string on the input tape with x 
valuated as the position taken by the reading head; more precisely, as our 
logic is defined for graphs, t is true whenever nd-gr(l-u7H) |= f{u), where w 
is the input string, and u is the node corresponding to the position of the 
reading head. 
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The 2gsm-mso does not move step-wise on the input tape, but it 'jumps' 
as specified by the formulas !fi{x,y), as follows. Assuming the machine is in 
position u, it moves to a position v for which nd-gT{\-w-\) \= ipi{u,v), where 
we have identified positions on the input tape with their corresponding nodes 
of the graph nd-gr(l-wH). 

To guarantee that the (pi{x, y) describe deterministic moves of the reading 
head, we require that the relations specified by ipi{x,y) are functional, for 
each input string w, i.e., for every position u there is at most one position 
V such that nd-gr(l-wH) |= (pi{u,v). Note that functionality is expressible in 
the logic: (Va;)(V|/i)(V?/2)[ '~Pi{x,yi) A ipi{x,y2) ^ yi = y2 ]■ Consequently, it 
is decidable; we may use Biichi's result (Proposition |l|, which is effective) to 
verify that it is satisfied by every string in hS^H. 

3 Example. Consider again the string transduction m = 

{ {aHaH---aHd-+\aH'^aH'^---aH'"a'^+^) \ n > 0,2i, . . . ,i„+i > }. 

We use the predicate nexta(a;, y) to specify the first position y following 
X that is labelled by a: 

X ~<y A \aba{y) A (V2;) [{x~<zAz~<y)-^ -ilaba(z) ] 

Similarly we construct an expression fisa(x, y) denoting the first a in the 
present segment of a's, 

y ^xA {\/z){y ^ z A z ^ x -^ \aJoa{z)) A -^{3z){edge^{z,y) A \aha{z)) 

Using these predicates we build a deterministic 2gsm-mso that realizes 
m. In state 1 it walks along a segment of a's, copying it to the output tape. 
Then, when the segment is followed by a b, it jumps back to the first a of the 
segment for a second pass, in state 2. When the end of the segment is reached 
for the second time, the machine jumps to the next segment, returning to 
state 1. At the last a of the input the machine jumps to the right end marker, 
and halts in the final state 3. 

Let El = S2 = {a, 6}, Q = {1, 1', 2, 2', 3}, q^n = 2', qj = 3, and S 
consisting of the transitions 

(1, (3|/)(edge^(x,y) Alab«(y)), l,a,edge^{x,y), l',X,x = y) 
(1', (3l/)(edge^(x,y) Alabb(?/)), 2, 6, fisa(a;, y), 3, A, lab^(y)) 

13 
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Figure 2: Computation for {a^b'^aba, a%^aba) of 2gsni-niso from Example 



(2, (3y)(edge^(x,?/) Alaba(?/)), 2, 6, edge^(x,?/), 2',A,x = ?/) 
(2', (3y)(x ^ y Alaba(i/)), 1, a,nexta(x,?/), 3, A,lab^(y)) 

The computation of the machine on input a^b^aba can be visuahzed as in 
Figure ^ (where, again, A is omitted from the edges of the computation). □ 

Without loss of generality we assume that the 2m's we consider never 
write more than one symbol at a time, i.e., for each instruction {p, a, gi, ai, /ii, 
go, «0; /^o) we have \ai\ < 1 (for i = 1, 0). 

We abbreviate deterministic 2m's by adding a 'd' to the usual abbrevia- 
tion, hence we speak of 2dgsm, 2dgsm-rla, and 2dgsm-mso. The families of 
string transductions realized by these three types of deterministic sequential 
machines are denoted by 2DGSM, 2DGSM^la^ ^^j 2DGSM^so^ respec- 
tively. 

Unlike their nondeterministic counterparts ( |[Kie75|| , see also Lemma ^ 
and the remark following it), deterministic 2gsm's are closed under compo- 
sition, as was demonstrated by Chytil and Jakl. As an essential part of the 
proof the fact is used (proved in ||HoU1671| ) that a 2dgsm can keep track of the 
state of another (deterministic) one-way finite state automaton working on 
the same tape (from left to right or from right to left). For the left-to-right 
case, it is clear how to do this as long as the reading head moves to the right. 
Backtracking ('undoing' a move) on the occasion of a step to the left, needs 
a rather ingenious back and forth simulation of the automaton. 
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4 Proposition ([ |ChJa77| ) 2DGSM is closed under composition. 

In the remainder of this section we show that the three types of de- 
terministic machines defined above are all equivalent, i.e., that 2DGSM = 
2DGSM^LA _ 2DGSM^so 

Every 2gsm is of course a simple 2gsm-rla, using trivial look-around tests, 
i.e., tests of the form {Ri,a,Rr), with Ri = hS*, and Rr = S^H (with the 
exceptions Re = {A} when cr = h, and Rr = {A} when a = -\). 

It follows from Biichi's result. Proposition |l], that any 2gsm-rla can be 
reinterpreted as a 2gsm-mso by changing the specification of the tests and 
moves into formulas, as follows. 

First, consider a look-around test t = {Ri, a, Rr). Let ipi{x) be a formula 
expressing that the string to the left of position x belongs to the regular 
language Re. It can be obtained from a closed formula ip defining Ri by 
restricting quantification to the positions to the left of x, i.e., by replacing 
subformulas (3|/)e(i/) by i3y)iy ^xA^y)) and (3^)^(1^) by (3F)((Vi/)(i/ e 
Y^y^x)AaY)). 

Similarly, we obtain a formula ipr{x) expressing that the string to the 
right of position x belongs to the regular language Rr. Clearly, the test t is 
equivalent to the formula f^t^x) = ipi^x) A labo-(a;) A ipr{x). 

Finally, one-step moves are easily translated into formulas. A move e = 
+ 1 is equivalent to stating that the new position is next to the original: 
edge^{x,y). Of course, e = — 1 is symmetric, whereas e = is expressed by 
X = y. Note that these formulas are functional. 

These observations prove the first relations between the families of trans- 
ductions. 

5 Lemma. 2DGSM C 2DGSM^la ^ 2DGSM^^so 

The feature of 2dgsm's that they can keep track of the state of a one-way 
finite state automaton (cf. the remark before Proposition ^), is modelled 
by us as regular look-around. Thus, for readers familiar with this feature it 
should be quite obvious that 2DGSM^'"^ C 2DGSM. Here we prove it using 
Proposition ^. 

6 Lemma. 2DGSM^la ^ 2DGSM. 
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Proof. By Proposition ^, 2DGSM is closed under composition. We prove 
the lemma by decomposing a given 2dgsm-rla A4 into a series of 2dgsm's, 
together realizing the transduction of A4. 

The final 2dgsm performs the required transduction, whereas all the other 
transductions 'preprocess the tape', by adding to the original input the out- 
come of the various tests of A^. As we also need this information for the 
positions containing the end-of-tape markers h and H, we start by a trans- 
duction that maps input w to the string t>w<, where > and < are new symbols. 
Information concerning the end-of-tape positions is added to these new sym- 
bols. The other machines may ignore h and H, and treat > and < as if they 
where these end-of-tape markers. 

For each look-around test t = {Ri, a, Rr) of M. we introduce a 2dgsm 
M-t that copies the input, while adding to each position the outcome of the 
test t for that position in the original string (ignoring any other additional 
information a previous transduction added to the string). The machine M.t 
itself can be seen as the work of three consecutive 2dgsm's. The first one, 
simulating a finite state automaton recognizing Ri, checks on each position 
whether the prefix read belongs to Ri. It adds this information to the symbol 
at that position. The second transducer, processing the input from right to 
left, simulating a finite state automaton for the mirror image of i?^, adds 
information concerning the suffix. Note that the input has been reversed in 
the process. This can be undone by another reversal performed by a third 
2dgsm. 

Once the value of each look-around test of M. is added to the original in- 
put string, obviously the transduction of M. can be simulated by an ordinary 
2dgsm. □ 

Biichi's result (Proposition |l]) allows us to show that the 2gsm-mso can 
be simulated by the 2gsm-rla. Additionally we need the following (folklore) 
result on the structure of certain regular languages (cf. ||Pix96| , Lemma 8.1]). 



7 Lemma. Let A C E &e alphabets, and let i? C S* he a regular language 
such that each string of R contains exactly one occurrence of a symbol from 
A. Then we may write R as a finite union of disjoint languages Re ■ a ■ Rr, 
where a G A, and Re, i?^ ^ (S — A)* are regular languages. 

Proof. Let ^ be a deterministic finite automaton accepting R. Every path 
(in the state transition diagram of A) from the initial state to a final state 
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passes exactly one transition labelled by a symbol from A. For any such 
transition (p, a, q) of A let Ri consist of all strings that label a path starting 
in the initial state of A and ending in p, and symmetrically, let Rr consist of 
all strings that label a path from q to one of the final states of A. Obviously, 
Re and Rr are regular, and R is the union of the languages Rg ■ a ■ Rr taken 
over all such transitions. Since A is deterministic, these languages are easily 
seen to be disjoint. □ 

8 Lemma. 2DGSM^so ^ 2DGSM^la 

Proof. We show how to simulate the instructions of a 2gsm-mso by a 2gsm- 
rla. Recall that such an instruction is specified as {p,t, gi,ai,/ii, qo,ao,fio), 
where t is a formula (p{x) with one free node variable, and the moves /ij are 
(functional) formulas ipi{x,y) with two free node variables. 

Tests: unary node predicates. Consider a test (p{x) in MSO(Ei U {h, H}, *). 
It can easily be simulated by regular look-around tests. Identifying (Ei U 
{h, H}) X {0, 1}^^'^ with (El U {h, H}) x {0, 1}, consider the language L{ip), 
which is regular by Proposition |1|. As each string of this language contains 
exactly one symbol with 1 as its second component, it can be written as 
a finite union of languages Ri ■ (cr, 1) ■ Rr, with regular languages Ri, Rr C 
((El U {h, H}) X {0})*, and cr G Ei U {h, H}, see Lemma ^. This implies that 
the test (f{x) can be simulated by a finite disjunction of the look-around tests 
{R'g, a, R'j.), where each R'^, R'^ is obtained from the corresponding Re, Rr by 
dropping the second component (the 0-part) of the symbols. Of course, this 
disjunction is computed by testing each of its alternatives consecutively. 

Moves: binary node predicates. Once the test of an instruction is evaluated, 
one of its moves is executed, and the output is written. This move is given 
as a formula (p{x,y), specifying a functional relation between the present 
position X and the next position y on the input. Where the 2dgsm-mso may 
'jump' to its next position, independent of the relative positions of x and y, 
a 2dgsm-rla can only step to one of the neighbouring positions of the tape, 
and has to 'walk' to the next position when simulating this jump. 

Before starting the excursion from x to y the 2dgsm-rla determines the 
direction (left, right, or stay) by evaluating the tests {^y){y -< a; A ip{x,y)), 
(37/)(x -< y A ip{x,y)), and (3y)(x = y A ip{x,y)) using the method that we 
have explained above. Since (p{x, y) is functional, at most one of these tests 
is true. 
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In the sequel we assume that our target position y hes to the left of the 
present position x, i.e., test {^y){ii -< x A ip{x, y)) is true. The right-case can 
be treated in an analogous way; the stay-case is trivial. 

Similarly to the case of tests, identify (Si U {h, H}) x {0, l}'"^^'^^ with 
(Si U {h, H}) X {0, 1}^, and consider L{y -< a; A ip{x, y)). Each string of this 
language contains exactly one symbol with (0, 1) as its second component, 
the position of y, and it precedes a unique symbol with (1,0) as its second 
component, the position of x; all other symbols carry (0,0). It can be written 
as a finite disjoint union of languages Re-{<7, 0, 1)-Rm-{T, 1, 0)-Rr, with regular 
languages Ri, Rm, Rr ^ ((Si U {h, H}) x {(0, 0)})* and cr, r G Si U {h, H}, by 
applying Lemma |^ twice. 

Our moves are functional, meaning that there is a unique position y that 
satisfies the predicate ip{x, y) with x the present position. Still before starting 
the excursion from x to the new position y, the 2dgsm-rla determines which 
language in the union above describes this position by performing the regular 
look-around tests {R[ ■ a ■ i?^, r, i?[,), where each R[, R'^, R[. is obtained from 
the corresponding Ri, R^, Rr by deleting the second component (the (0,0)- 
part) of the symbols. 

The 2dgsm-rla now moves to the left. In each step it checks whether the 
segment of the input string between the present position (candidate y) and 
the starting position (corresponding to x) belongs to the regular language 
R'^. This can be done by simulating a finite automaton for (the mirror 
image of) R'^ in the finite state control. 

Each time this segment belongs to i?^, it performs the rla-test (i?^, a, S^H), 
to verify the requirement on the initial segment of the input. Once this last 
test is satisfied, it has found the position y and writes the output string. □ 

We summarize. 
9 Theorem. 2DGSM = 2DGSM«la _ 2DGSM^so 

A similar result can be obtained for nondeterministic gsm's by the same 
line of reasoning. However, in Lemma ^ we need the inclusion 2DGSM o 
2NGSM C 2NGSM rather than 2DGSM o 2DGSM C 2DGSM (Proposi- 
tion ^. This new inclusion can be proved like the latter one [|Ch Ja77|| . 



3 MSO Definable String Transductions 

As explained in the Preliminaries, we consider mso logic on graphs as a means 
of specifying string transductions, rather than dealing directly with strings. 

Although we are mainly interested in graph transductions that have 
string-like graphs as their domain and range, occasionally we find it useful 
to allow more general graphs as intermediate products of our constructions. 

In this section we recall the definition of mso graph transductions, and 
from it we derive two families of mso definable string transductions, which 
differ in the way strings are represented by graphs. We present basic exam- 
ples, and characterize the relation between the two families we have defined. 

We start with the general definition. 

An mso definable transduction | |(Jou91| , |(Jou94| , |Eng91a| , [ti]nUo97| , [See92 



is a (partial) function that constructs for a given input graph a new output 
graph as specified by a number of mso formulas. Here we consider the de- 
terministic (or, 'parameterless') mso transductions of ||Cou94|| . For a graph 



satisfying a given domain formula v^dom we take copies of each of the nodes, 
one for each element of a finite copy set C. The label of the c-copy of node 
X [c & C) is determined by a set of formulas ip'^{x), one for each symbol a 
in the output alphabet. We keep only those copies of the nodes for which 
exactly one of the label formulas is true. Edges are defined according to 
formulas ip'^^'^^{x,y): we construct an edge with label 7 in the output graph 
from the Ci-copy of x to the C2-copy of y whenever such a formula holds. 

10 Definition. An mso definable (graph) transduction r : GR(Si,ri) — *• 
GR(E2, r2) is specified by 

- a closed domain formula v^dom, 

- a finite copy set C, 

- node formulas ^'^{x), with one free node variable x, for every a G E2 
and every c E C, and 

- edge formulas ip'^^''^^{x,y) with two free node variables x,y, for every 
7 G r2 and all Ci, C2 G C, 

where all formulas are in MSO(Ei,ri). 

For g G GL{ipdom) with node set Vg, the image T{g) is the graph {V,E,i), 
defined as follows. We will write m'^ rather than {u, c) for elements oiVgXC. 
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V = {u^\ ueVg,cEC, 

there is exactly one a G S2 such that g \= ip^{u)}, 

E = {(u^i,7,t;^2) I uci^^c2 ^Y^^ ^Y2,g^ v^^i'^^j^u, i;)}, and 
liu") = cr if 5f 1= ifiliu), for u" eV, a e S2. 
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11 Example. Let S = {a, b}. As a simple example we present an mso graph 
transduction from GR(S, *) to GR(*, {a, b, *}) that transforms a linear graph 
representing a string into a ladder, while moving the symbols from the nodes 
to the steps. 

Domain formula v^dom expresses that the input graph is a string repre- 
sentation (see the end of Section |l]). 

The copy set C is {1, 2}. 

Each node is copied twice: (pi = (pi = true. 

Unlabelled edges are copied twice, one of these in reverse: 



1,1 



edge, (x, y),ip^' = edge, {y,x), </?, 



Labelled edges are introduced: 



1,2 



^l'' 



false. 



^a 



1,2 



{x = y) A labo-(a;), ipl'^ = ip"^'^ = y?^'^ = false, for a = a,b. 



(a) =-(6) ^-(ay 



O 



n 



The family of mso definable graph transductions is denoted by grMSO. 



Its basic properties are summarized below, see, e.g., ||Cou97| , Prop. 5.5.6]. 
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12 Proposition. 

1. grMSO is closed under composition. 

2. The mso definable graph languages are closed under inverse mso defin- 
able graph transductions. 

We now consider mso definable grapli transductions as a tool to specify 
string transductions. 

There are two equally natural (and well-known) ways of representing a 
string as a graph. First, as we have seen in the Preliminaries, for a string 
w G S* of length /c, we may represent w by the graph nd-gr(tf;) in GR(S, *), 
consisting of k nodes labelled by the consecutive symbols of w, and k — 1 
(unlabelled) edges representing the successor relation for the positions of the 
string. Dually, w can be represented by the graph ed-gr(w) in GR(*,E), 
consisting of k + 1 (unlabelled) nodes, connected by k edges that form a path 
labelled by the symbols of w. In the figure below we show ed-gr{ababb). Note 
that ed-gr(A) consists of one unlabelled node. 

It will turn out that the 'edge graph representation' of strings is more 
naturally related to two-way machines than the 'node graph representation'. 

13 Definition. 

1. Let Si,S2 be two alphabets, and let m C S^ x S2 be a string trans- 
duction. 

i. Its translation to graphs {(ed-gr(w),ed-gr(z)) | {w,z) G m} in 
GR(*, El) X GR(*, E2) is denoted by ed-gr(m); 

ii. its translation to graphs {(nd-gr(ti;), nd-gr(2;)) | {w,z) G m} in 
GR(Ei, *) X GR(E2, *) is denoted by nd-gr(m). 

2. MSOS denotes the family of all string transductions m such that ed-gr(m) 
belongs to grMSO, and MSOSnd denotes the family of all string trans- 
ductions m such that nd-gr(m) belongs to grMSO. 

n 
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Figure 3: Edge representation for {a^b'^aba,a^b^aba), cf. Example |4 



A transduction in MSOS is called an mso definable string transduction, 
and a transduction in MSOSnd is called a X-restricted mso definable string 



transduction. The reason for this terminology will be explained in Lemma [18. 



14 Example. Consider the transduction ed-gr(?7i), where m is the string 
transduction from Example |^, 

{ {aHa'^b---aHa'^+\aH'^aH'^---aH'"a'"+') \n> 0, Zi, . . . ,i„+i > }. 

The formulas for the construction of the output graph have nodes as their 
reference points, whereas the information (symbols) is attached to the edges. 
Hence we frequently use the formula outo-(a;) = (3y)edge^(x, ?/). 

As in Example ^ we have an expression fis'^(x, y) denoting the first node 
in the present segment of a's, this time referring to outgoing edges: 

y ^xA (yz){y ^ z A z ^ x -* out„(2;)) A -^{3z){edge^{z, y)) 

Similarly, we have the edge variant next[j(x, ?/) by replacing the subfor- 
mulas Ishaiy) by outa(?/) in the original formula nexta(x, ?/). 

Choosing the copy set C = {1,2,3}, and the domain formula defining 
edge representations of strings, the transduction ed-gr(m) is defined by the 
following formulas. 

ifl = outa(a;) 

ipl = outa(x) A {3y){x ^y A ontb{y)) 






-iouta(x) A -iout5(x), the final node of the string, 
edge„(x, y) 
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Va^ = (32)(edge^(x,z) A -^onta{z)) A &s'^{x,y) 

ipl'^ =edge^{x,y) 

^l'^ = (32;)(edge„(x, z) A ^out„(z)) A next'^{x,y) 

fb^ = -^{^z){lpI'\x, z) V (/?b'^(x, z)) 

(/?3.i = false, for j = 1,2,3. 

The construction is illustrated in Figure ^ for {a^b'^aba, a^b^aba) G m. 
Note that we have put the copy numbers within the nodes. □ 

The transition from one graph representation to the other is (essentially) 
definable as mso graph transduction, and will be heavily used in the sequel. 
We discuss this in the next example. 

15 Example. The graph transduction ed2nd = { (ed-gr(t/;), nd-gr(i(;)) | w G 
S* } : GR(*, S) -^ GR(S, *) from the edge representation of a string into its 
node representation is mso definable, as follows. 

- V^dom expresses that the input is a string representation, an edge-labelled 
path (consisting of at least one node); 

- the copy set C equals {!}; 

- 99^ = (3y){edge^{x,y)) , i.e., the label a is moved from the edge to 
its source node. None of these formulas is true for the final node of the 
input graph, which means that this node is not copied; 

~ V^i'^ = Vo-eE 6dge^(x, y) , i.e., edges are copied, without their labels. 

The inverse mapping ed2nd~^ = { (nd-gr(w),ed-gr(w)) \ w E T.* } : 
GR(S, *) — > GR(*,S) is not mso definable: The representation nd-gr(A) of 
the empty string has no nodes that can be copied to obtain the single node 
of ed-gr(A). 

If we omit the empty string, the graph transduction nd2ed = { (nd-gr(ti;), 
ed-gr(ii;)) | w G T,*,w 7^ A } can be defined as follows. 

^ V'dom again expresses that the input is a string representation, a (non- 
empty) node-labelled path; 
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- the copy set equals {1,2}; 

- ^\ = true, ifl = -i(3|/)(edge^(x, y)) , i.e., all nodes are copied once, 
except the last one which gets two copies; 

- (fl-'^ = edge^{x,y) A labo-(x) , i.e., the label is moved from the node 
to its outgoing edge; 

- ipl''^ = {x = y) A labo-(x) , which deals with the last edge; 

- ^l'^ = ^y = false. 

n 

The above example illustrates an important technical point: every mso 
graph transduction maps the empty graph to itself (provided it belongs to 
the domain). This means that, when using the node-encoding nd-gr for 
strings, the empty string can only be mapped to itself. As we do not want 
to restrict ourselves to this kind of transductions, we have chosen to consider 
both variants of mso definable string transductions. Although nd-gT{w) is 
a slightly more direct graph representation of the string w in terms of its 
positions and their successor relation, the advantage of ed-gr(w) is that it is 
never empty and thus satisfies all the usual logical laws. 

The transition from node representation to edge representation for strings 
does not influence the validity of Biichi's result. 

16 Proposition. A string language K C J]* is regular iff there is a closed 
formula (f c»/MSO(*, S) such that K = {w & T,* \ ed-gr(w) |= ip}. 

Proof. Rather direct, using Biichi's result (Proposition |1](2)) and Proposi- 
tion p!2|(2). We consider one implication (from right to left) only. 

Let the string language i^ C S* be defined by the closed formula if of 
MSO(*, S), as in the statement of the lemma (using the edge representation). 
We show that there exists a formula defining K using the node representation. 
Consider the mso definable graph transduction nd2ed mapping nd-gr(w) to 
ed-gr(w) for all non-empty t/; G S*, cf. Example [l^. The graph language 
nd2ed~^(GL ((/))) = {nd-gr(w) | w G S*,U7 7^ A,ed-gr(-u;) |= ip} is mso defin- 
able, say by an mso formula ip of MSO(S, *). It defines the string language 
^W = {w G E* I nd-gr(w) |= ip} = {w E T,* \ ed-gr(w) \= ip,w j^ \} = 
K — {A}. If A ^ -ft', then we are done; otherwise, consider L{'ip V -i(3x)true). 

D 
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The families MSOSnd and MSOS are equal, up to a small technicality 
involving the empty string — a point already illustrated in Example |1^, and 
in the proof of Proposition 0. 

To prove this, we use the following basic fact (cf. ||Cou94| Proposi- 
tion 3.3]). 

17 Lemma. Let Ti andT2 he mso definable graph transductions from GR(Si, 
Fi) toGR(S2,r2). 

If Ti and T2 have disjoint domains, then also ri U r2 G grMSO. 

Proof. Consider Tj fixed by the copy set Ci and formulas v^dom.i, ^ai, and 
f'^]f^- We may assume that Ci and C2 are disjoint. 

The domain formula for the union is the disjunction v^dom,! V v5dom,2; its 
copy set is C = CiU C2. 

The node formulas and the edge formulas for both transductions are also 
taken together (by disjunction), but we ensure that they are applicable only 
for the appropriate input by changing ip'^^ to ipdom,i A v^^j, and similarly for 
the edge formulas. We add v?^^'^^ = v^^^''^^ = false for ci G Ci, C2 G C2, 7 G r2. 

D 

18 Lemma. Let m C T.^ x H^ be a string transduction. Then 
m G MSOSnd iff 1^ ^ MSOS and (A, z) E m implies z = \. 

Proof. (1) From left to right; assume m G MSOSnd, i-e., nd-gr(m) G grMSO. 
We split m into the mappings rh = {{w, z) E m \ z ^ X}, and m\ = {{w, z) G 
m\ z = A}. 

As nd-gr(m) G grMSO, also ed-gr(m) = ed2nd o nd-gr(m) o nd2ed is mso 
definable, by Proposition p!^l). 

By Proposition |12|(2), the domain of ed-gr(?7iA) is mso definable as it is 
the inverse image of {nd-gr(A)} for the transduction ed2nd ° nd-gr(?7i). Now 
it is easily seen that ed-gr(mA) G grMSO using for v^dom the formula defining 
the domain of ed-gr(mA), C = {!}, ipl = -'(3y)edge{x,y), and ipl^'^ = false. 



The union ed-gr(?7i) = ed-gr(m)Ued-gr(?TiA) is mso definable by Lemma T7 
Hence, m G MSOS. We have discussed already that the image of A under m 
must be A (provided A belongs to the domain of m) as nd-gr(A) has no nodes 
to copy. 

(2) From right to left; assume m G MSOS, i.e., ed-gr(?7i) G grMSO. 
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Then also nd-gr(m) = nd2ed o ed-gr(?7i) o ed2nd is mso definable, where 
771 = 171 — {(A, A)}. 

We are ready when A does not belong to the domain of m. Otherwise, as 
the transduction {(nd-gr(A), nd-gr(A))}, mapping the empty graph to itself. 



is easily seen to be mso definable, nd-gr(m) e grMSO follows by Lemma |T7 



D 



We finally observe that, from Proposition |T2|(1), it immediately follows 
that MSOS is closed under composition. Together with the closure under 
composition of 2D GSM (Proposition ^) this has been a strong indication for 
the equality of these two families, proved in the next section. 
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4 Logic and Machines 

In this section we establish our main result, the equivalence of the deter- 
ministic two-way sequential machines from Section |^, and the mso definable 
string transductions from Section ^ MSOS = 2DGSM. 

The first steps towards this result were taken already in Section ^ when 
we introduced the 2gsm with mso instructions, and showed its equivalence 
to the basic two-way generalized sequential machine. 

One technical notion that will be essential to bridge the final gap between 
logic and machine is modelled after Figure |l] in Example ^. That figure 
depicts the computation of a 2gsm on a given input string. The input string 
w can naturally be represented by nd-gr{\-w-\) with nodes corresponding to 
positions on the tape. On the other hand, the output string z is represented 
as ed-gr{z') where the edges conveniently correspond to steps of the 2gsm 
from one position to another (and where z is obtained from z' by erasing A, 
i.e., by removing the unlabelled edges). 

We introduce a notation for this representation. Let m : H^ ^ H"^ he 
a string transduction. We use tape(m) to denote the graph transduction 
{ (nd-gr(hwH), ed-gr(0)) \{w,z) em} from GR(SiU{h, H}, *) to GR(*, E2). 

19 Example. Consider the transduction tape(r7i), where m is the string 
transduction from Example |^, 

{ {aHa'^b---aHa'"+\a''l/'a%'^---aH'^a^"+') \ n > 0,ii, . . . ,i„+i > }. 

Previously we have shown that m G 2DGSM, here we will demonstrate 
that tape(m) is an mso definable graph transduction. 

Recall the predicate nextfj{x,y) from Example ^. 

For tape(m) the domain formula specifies linear graphs of the form nd-gr(l-u;H), 
w G {a, b}*, the copy set C is {1, 3, 5}, and we have formulas 



V?i = laba(x), 

fl = \aba{x) A {3y){x ^yA labb{y)), 

(fl = lab^(x), 

y?y = edge,(x,?/), 

Va^ = {x = y) A ^(3z)(edge,(x, z) A lab„(2;)), 

ifil'^ = edge,(x,y). 
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Figure 4: Mso transduction tape(m) from Example ^ 

^b^ = edge,(?/,x), 

V5fe'^ = {3z)(jiextb{x,z) A nexta(^,y)) A ^(3z)(edge^(z, x) A laba(2;)), 
i.e., connect to the first a of the next segment when we are at the first 
a of the present segment, 

ipl'^ = -n{3z){ifl'\x, z) V <^b'^(x, z)), 

(pl;^ = false, in all other cases. 

Note that the output of the transduction (cf. the lower graph in Figure ^ 
is obtained by contracting unlabelled paths in the computation graph of the 
2dgsm from Example §, Figure |^. □ 

The observation from the example is generally true: a string transduction 
m is realized by a 2dgsm if and only if its graph representation tape(m) is 
mso definable. We prove the two implications separately. 

20 Lemma. Let m : S^ — > S2 be a string transduction. 
If me 2DGSM, then tape(m) G grMSO. 

Proof. Let A4 = {Q, Si, S2, S, qm, qj) be a 2dgsm realizing the string trans- 
duction m : S* ^ S2, and consider a fixed input string w = o"i ■■■ a^, ctj e Si 
for i = 1, . . . ,n. Additionally we use ctq = h and o"„+i = H. 

We can visualize the 'computation space' of A^ on ty by constructing a 
graph 7x(w) that has as its nodes the pairs {p,i), where p is a state of M., 
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Figure 5: Computation space 'jMio, b (iba) for the 2dgsm Ai in Example |^ 

and 2 G {0, 1, . . . , n, n + 1} is one of the positions of the input tape carrying 
\-w-\. The edges oijM{w) are chosen in accordance with the instruction set S 
of A^: for each instruction t = (p, a, qi, ai, ei, go, «o, eo) in ^ there is an edge 
from (p, i) to {qi,i + ei) if 0", equals a, and an edge from {p, i) to (go, i + co) 
otherwise. The edge is labelled by the output symbol a^ G S2 U {A}. In this 
context we will consider A as a labelling symbol (rather than as a string of 
length zero) in order to avoid notational complications. 

In Figure ^ we illustrate the computation space for the 2dgsm from Exam- 
ple |]on input o?lP'aha (with output A omitted, as usual). The computation 
on that input is represented as a bold path (cf. Figure |lD. 

As J^ is deterministic, every node of 7x(w) has at most one outgoing 
edge. The output of the computation of A^ on w can then be read from 
iM-i.'^) by starting in node {qin,0), representing JH in its initial configura- 
tion, and following the path along the outgoing edges. The computation is 
successful if it ends in a final configuration {qj, k). We will mark the initial 
and final nodes of 7x(w) by special labels > and <, the other nodes remain 
unlabelled (represented in our specification by '*'). 

Note that the graph 7x(ti') does not only represent the computation of 
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A4 on w starting in the initial state and 0-th position of the tape (marked 
by h) but rather all possible computations that result from placing Ai on an 
arbitrary position of the tape, in an arbitrary state. 

We construct a series of mso graph transductions, the composition of 
which maps nd-gr(l-t/7H) to ed-gr(2;) for each {w, z) G m. As grMSO is closed 
under composition (Proposition |12|), this proves the lemma. 

The first graph transduction ti maps nd-gr(l-wH) to 7x(ti;). The sec- 
ond graph transduction T2 selects the path in 7a^(w) corresponding to the 
successful computation oiM.o\iw (if it exists) by keeping only those nodes 
that are reachable from the initial configuration and lead to a final configu- 
ration. The last graph transduction ts removes edges labelled by A (used as 
a symbol representing the empty string) while contracting paths consisting 
of these edges. 

Step one: constructing 7a^(w). Let Ti : GR(Si U {l-,H},*) -^ GR({*,>,<}, 
S2 U {A}) be the graph transduction that constructs 7x(w). We follow the 
general description above, and formalize ri as mso transduction. 

The domain formula of the transduction specifies that the graph is of the 
form nd-gr(l-wH) for some string w. The copy set equals C = Q, where Q 
is the set of states of A^. The node {q, i) of 7a^(w) is identified with uf, the 
g-copy of the node Ui of nd-gr(l-wH) corresponding to the i-th position of the 
input tape, labelled with cr,. 

The labels of the edges are chosen according to the instructions of Ai. 
For a e S2 U {A}, p,q e Q, and e G {-1,0,+!} let step[e]P'<?(a;) be the 
following disjunction, where the unspecified 'dots' range over their respective 
components: 

\/ labo-(x) V \/ labo-(x) 

\ft III I ) 1 I y (p,T,.,.,.,q,a,e)Go 



Then, 



<''?= (edge,(x,i/)Astep[-M]^''^(x)) 

V (x = yAstep[0]^''?(x)) 

V (edge,(i/,x)Astep[-l]P'«(x)) 
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All copies of the nodes are present, with special labels for initial and final 
nodes: 

ifl, = lab|_(a;), when q = qin, and ip^ = false, otherwise. 
(/9^ = true, when q = qj, and ip'^ = false, otherwise. 

Note that we assume that q^ 7^ g/, in order to avoid that both (^^ and ip% 
are defined for the initial node. This is the case when A1 accepts any input 
in its initial state without executing instructions. We satisfy the assumption 
by adding additional instructions to a new final state. 

Step two: selecting the computation path. The transduction T2 : GR({*,>, <}, 
E2 U {A}) -^ GR(*, E2 U {A}) removes nodes that are not on the path from 
the node labelled by > to a node labelled by < (if it exists). Nodes that are 
not on such a path do not correspond to the configurations that are part of 
the (successful) computation oi Ai on w. Note that if such a path exists, 
then it is unique. 

Recall that the predicate :< specifies the existence of a path from x to y. 
By X ^x y we restrict ourselves below to a path containing only edges with 
label A. 

Formally, 

V^dom = i3x){3y)[\aht>{x) A lab<(i/) A x ^ y], 

c = {i}, 

(pl{x) = {3y){3z)[\aht>{y) Ay ^ x A \ah<i{z) A x ^ z] 
and, for a G S2 U {A}, ipl;'^{x,y) = edge^{x,y). 

Step three: contracting X-paths. The last graph transduction of three, Ts : 
GR(*, S2 U {A}) -^ GR(*, S2) deletes all nodes that have an outgoing A- 
labelled edge, and contracts each A-path to its last node. 

This can be specified with the trivial copy set C = {!}, node formula 
ifl = -^{3y){edge^{x,y)), and edge formulas v^^ = {3z){edge^{x, z)Az ^x y), 
for a G Ti2- ^ 
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Now that the 2dgsm has learned to understand the language of monadic 
second-order logic, cf. Theorem p, the converse of the previous result has a 
rather straightforward proof. 

21 Lemma. Let m : S^ ^ Sg be a string transduction. 
//tape(m) G grMSO, then m e 2DGSM. 

Proof. Starting with the mso transduction tape(m) : GR(Si U {h, H}, *) —>■ 
GR(*, S2) we build a 2dgsm-mso M. for m that closely follows the mso spec- 
ification of tape(m). 

Assume tape(m) is specified by domain formula ipdom, copy set C, node 
formulas v?^, c E C, and edge formulas v^^^''^^, Ci,C2 E C, a E E2. The state 
set of M. is (in principle) equal to the copy set C: when ip'^^'^^{u,v) is true 
for a pair u,v of nodes, then M., visiting the position corresponding to u 
of the input tape in state ci, may move to the position corresponding to v 
changing to state C2, while writing a to the output tape. 

Note that, for each input graph g, tsipe{m){g) defines a graph represen- 
tation of a string, hence at most one of these formulas defines an edge in 
a given position (node) and a given state (copy). However, in general the 
formula ^p'^^''^'^ is only functional as far as graphs g satisfying the domain 
formula v^dom are concerned, and for these graphs only when restricted to 
nodes for which the respective ci and C2 copies are defined. Since our formal 
definition of 2dgsm-mso demands functional moves, we consider the formulas 
V'^i''=2(x,?/) = cp'^'''^ix,y) A yp^i(x) A ^t'iy) A y^dom- 

The instructions of Ai are of the form 

(ci,(3y)(^:i'^^(x,y)), C2,a,r^'^'^%x,y)) 

- but this is 5-tuple notation, and has to be replaced by 8-tuples where for 
a fixed state ci each of the alternatives (c2,cr) G C x S2 has to be tested 
consecutively, as explained in the paragraph about 2gsm in Section |^ (using 
additional states). 

If none of the edge formulas gives a positive result, the present node has 
no successor, which indicates the last position of the output string. In that 
case, the series of consecutive tests ends up in the final state g/. 

Initially M. has to find the unique node of the output graph that has 
no incoming edges. We solve this by adding the new initial state qin from 
which this node is found by testing all possibilities, but again in a consecutive 
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fashion, for C2 G C: 

(qin, (3y)[^f (y) A ^incom'=2(y)], c2,X,(pfiy) A ^incom"^^^) ^ 
where incom^^d/) abbreviates (3,2) Vciec,crGS2(V'CT^''^^('25?/))- '-' 

22 Lemma. Le^ S 6e an alphabet. The transduction ta.pe{id) : GR(S U 
{h, H},*) -^ GR(*, S) mapping nd-gr(l-ti;H) to ed-gr{w) is an element of 
grMSO, as is its inverse tape{id)~^ . 

Proof. The identity on S* is easily performed by an 2dgsm. Hence tape(2(i) G 
grMSO, by Lemma gU[ 

As for the inverse tape{id)~^ , note that mapping ed-gr(w) to ed-gr(l-w7H) 
is mso definable because ed-gr(ty) has at least one node, which may be copied 
to provide the additional nodes that are connected by edges labelled by h 
and H to the original graph. We now compose this mapping by ed2nd, which 
is mso definable by Example |15l □ 



We complete the section by deriving the equivalence between the mso de- 
finable string transductions and the deterministic two-way finite state trans- 
ductions, uniting logic and machines. 

23 Theorem. MSOS = 2DGSM. 

Proof. By our previous lemma, the transduction tape(i(i) from nd-gr(l-wH) 
to ed-gr(w), for w G E^, is an element of grMSO, as is its inverse ta.pe{id)~^ . 
By the equalities tape(m) = tape{id) o ed-gr(m), anded-gr(m) = tape(i(i)'^ o 
tape(m), and the closure of grMSO under composition (Proposition 0), we 
have m G MSOS iff (by definition) ed-gr(r7i) G grMSO iff tape(?7i) G grMSO. 



The result now follows from Lemmas ^and ^ demonstrating tape(m) G 



grMSO iff m G 2DGSM. D 



As an immediate consequence of this result and Lemma |T8| we obtain the 
equivalence between the corresponding A-restricted transductions. 

We use 2DGSMA to denote those relations m in 2DGSM that satisfy 



(A, z) G m implies z = X, cf. Lemma [18. 
24 Corollary. MSOSnd = 2DGSMA. 



33 



5 Nondeterminism 

In this section we define the nondeterministic niso definable graph transduc- 
tions, and their derived string relatives. We observe that nondeterministic 
mso transductions are related to the deterministic mso transductions via 
relabelling of the input. 

A nondeterministic variant of mso definable transductions is considered 
|Uou91| , |Cou94|| . All the formulas of the deterministic version may now 



m 



have additional free node-set variables Xi, . . . ,Xk, called 'parameters', the 
same for each of the formulas. For each valuation of the parameters (by sets 
of nodes of the input graph) that satisfies the domain formula, the other 
formulas define the output graph as before. Hence each valuation may lead 
to a different output graph for the given input graph: nondeterminism. 

More formally, a nondeterministic mso definable (graph) transduction r C 
GR(Si,ri) X GR(E2,r2) is specified by 

- a set of parameters Xi, . . . , X^-, k > 0, 

- a domain formula V5dom(-^i, • • • , AT^), 

- a finite copy set C, 

- node formulas ip'^{x,Xi, . . . , X^) for a G S2, c G C, and 

- edge formulas ip'^^'^^{x, y, Xi, . . . , Xk) for 7 G T2, ci, C2 G C, 

where all formulas are in MSO(Si,ri). 

Recall from Section |I| that an input graph together with a valuation of 
the parameters can be represented by a S-valuated graph g which has node 
labels in Si x {0, 1}" (where S = {Xi, . . . , Xf^}) such that g\T,i is the input 
graph, and Ug is the valuation. By definition, g G GL{{p^om) iff g\^i,^g \= 

V5dom(-^l, • • • ,Xk). 

For each g G GL{ipdom) we define the graph f{g) similar to T{g) in Def- 



inition |T0|. The nodes of f{g) are defined using ^fjEi |= ^'^{u, Ui, . . . , Uk), 
where Ui = Ug^Xi), rather than g \= (p'^{u), and similarly for the edges 
and node labelling of f{g). The transduction r is then defined as follows: 
T = { (5'|Si,f(5f)) I g G GL{ipdom) }■ 

25 Example. Let m C {a}* x {a, b, jf\* be the relation 

{ (a", w^w) \ n > 0,w E {a, b}*, \w\ = n}. 
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The relation ed-gr(m) can be realized by a nondeterministic mso definable 
transduction, with parameters Xa and X5. The nodes of the input graph are 
copied twice, and the parameters determine whether the outgoing edge of a 
node in the input is copied as a-edge or 6-edge, respectively. 

The components of the transduction are as follows. The copy set equals 
C = {1, 2}, the domain formula v^dom(-^a) ^b) expresses that the input graph 
is a string representation, and additionally that the sets Xa and X5 form a 
partition of its nodes. 

All input nodes are copied twice: ipl{x,Xa,Xb) = ipl{x,Xa,Xb) = true. 

The edge labels are changed according to the sets Xa and X^, additionally 
the last node of the first copy is connected to the first node of the second 
copy by an 7^-edge: 

^yix, y, Xa, Xb) = (pl''^{x, y, Xa, Xb) = edge„(a;, y) A x E X„, 

for a = a,h, 

ip)f{x,y,Xa,Xb) = -^{3z)edgea{x,z) A -^{3z)edgea{z,y), 

ipy{x,y,Xa,Xb) = false, for all other combinations i,j,cr. 

Mapping aaa to ahhjj^ahh can be realized by taking the valuation v{Xa) = {1}, 
z/(Xb) = {2,3,4}. 

Note that this example can be changed such that it uses only one param- 
eter, as the sets represented by the parameters are complementary. □ 

We use grNMSO, NMSOSm, and NMSOS to denote the nondeterministic 
counterparts of the families grMSO, MSOSnd, and MSOS, respectively. The 
family of (nondeterministic) 2gsm transductions is denoted by 2NGSM. 

Unlike the deterministic case, the power of the nondeterministic 2gsm 
is incomparable to that of the nondeterministic mso definable string trans- 
duction. First, because the number of parameter valuations is finite, every 
nondeterministic mso transduction is finitary. This is not true for the 2gsm, 
which can realize the (non-finitary) transduction { {a^,a^^) | m, n > 1 }, by 
nondeterministically choosing the number m of copies made of the input. 

On the other hand, the nondeterministic mso transduction of the previous 
example cannot be realized by a 2gsm. 

26 Lemma. Let m C {a}* x {a, 6, #}* be the relation { {a"',w4^w) \ n > 
0,w e {a,b}*,\w\=n}. Then m ^ 2N GSM. 
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Proof. Assume m is realized by a (nondeterministic) 2gsm Ai with k states. 
Choose n such that 2"' > k ■ {n + 2). Consider the behaviour of Ai on input 
a". The input tape, containing l-a"H, has n + 2 positions. Hence, A1 has 
k ■ {n + 2) configurations on this input. Consider the configuration assumed 
by A4 when it has just written the symbol # on its output tape. As there 
are 2" possible output strings w^w for a", there exist two strings Wi and 
W2 for which this configuration is the same. This means that we can switch 
the computation of {a"' , Wii^wi) halfway to the computation of (a",W2#W2) 
obtaining a computation for (a",wi#w72) with wi ^ W2-, which is not an 
element of m. □ 

It is not difficult to see that the relation m from the lemma, can be real- 
ized by the composition of two 2gsm's, the first nondeterministically mapping 
a" to a string w G {a, &}* with \w\ = n, the second (deterministically) dou- 
bling its input w to wjj^w. This shows that 2NGSM is not closed under 
composition, as proved in ||Kie75|] for the corresponding families of output 



languages. In fact, the families 2NGSM of compositions of k 2gsm trans- 
ductions form a strict hierarchy, as proved in ||Gre784 |li)ng82| , |ti)ng91b|| (again 



for the corresponding families of output languages) . 

However, the nondeterministic mso transductions are closed under com- 
position ||Cou97| , Prop. 5.5.6]. 



27 Proposition. grNMSO, and consequently NMSOS and NMSOSnd are 
closed under composition. 

By grREL we denote the family of (nondeterministic) node relabellings 
for graphs. A relation in GR(Si, F) x GR(S2, F) is a node relabelling if there 
exists a relation /? C Si x S2 such that the images of a graph g are exactly 
those graphs that can be obtained from g by replacing every occurrence of a 
node label a by an element of i?(cr), leaving edges and their labels unchanged. 

We use REL to denote the family of (nondeterministic) string relabellings, 
related to grREL through the mapping nd-gr. 

We observe the following elementary relationship between deterministic 
and nondeterministic mso definable graph transductions. 

28 Theorem. grNMSO = grREL o grMSO. 
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Proof. The proof of the first inclusion grNMSO C grREL o grMSO is 
iniphcit in our definition of grNMSO. The nondeterminism of an mso trans- 
duction r with parameters Xi, . . . , Xk can be 'pre-processed' by a relabelhng 
p that maps each node label a G Ei nondeterministically to a symbol 
(o", /) G Si X {0,1}^, where S = {Xi,...,Xfc}. The valuation of Xi has 
now become a part of the labelling, and we change the domain formula 
^Aora{.Xi, . . . , Xk) , the node formulas 99^(0;, Xi, ... ,Xfc), and the edge for- 
mulas (/?^i''^^(x, y, Xi, . . . , Xk) that specify the mso transduction accordingly. 
Each atomic subformula y & Xi in such a formula is replaced by the disjunc- 
tion V/(Xi)=i,(TeSi lab(CTj)(?/), and each atomic subformula laba{y) is replaced 
by V/:H^{o,i} l^t)(a-j)(?/). In this way we obtain 'deterministic' equivalents 
(^dom, 0a{^)y "1^7 ''^^ i^y v) fo^ K^so trausductiou f. We now have t = p o f which 
follows by observing that for a graph g G GR(Si x {0, 1}", *), g \= (^dom if 
and only ii g\T,i,h'g \= ipdomi^i, ■ ■ ■ , ^k), and similarly for the other formulas. 

For the converse inclusion grNMSO 3 grREL o grMSO, it suffices to 
note that each nondeterministic node relabelling is a nondeterministic mso 
definable graph transduction. The inclusion then follows from the closure of 



grNMSO under composition. Proposition ^. 

Let i? C El X ZI2 define a graph node relabelling. We formalize it as mso 
graph transduction from GR(Si,r) to GR(S2,r) by choosing parameters 
Xr, r G S2, with the intended meaning that a node belonging to Xr will be 
relabelled into r. 

The domain formula v'dom expresses that the X^ form an 'admissable' 
parameter set by demanding each node to be in exactly one of the Xr, and 
additionally, if a node has label a, then Xr containing this node satisfies 
r G R{(t): 

(Vx) y {xeXrA /\ x(^Xr)A (Vx) /\ (lab,(x) ^ V xeXr) 
TeS2 t'j^t o-eSi Te-R{o-) 

Each node is copied once, relabelled according to Xr'. 

(fl = X e Xr, T e S2, 

ifY = edge^{x,y), 7 G L. 



37 



As we have observed, any string relabelling can be 'lifted' to a graph node 
relabelling using the graph interpretation nd-gr of strings. By restricting the 
previous result to those graph transductions that result from strings, we ob- 
tain a result for mso definable string transductions in the node interpretation. 

29 Corollary. NMSOSnd = REL o MSOSm- 

In addition to REL, we need MREL denoting the family of marked string 
relabellings, that map a string w first to the 'marked version' \-w-\, and then 
apply a string relabelling. 

30 Theorem. NMSOS = MREL o MSOS. 

Proof. First, the inclusion from left to right. Let m G NMSOS, i.e., 
ed-gr(m) G grNMSO. 

Consider the string transduction m' = { {\-w-\,z) \ {w,z) & m}. Then 
m' is an element of NMSOSnd, as nd-gr(m') equals the composition tape(z(i) o 
ed-gr(m) o ed2nd of (nondeterministic) mso definable graph transductions. 



where tape{id) is the mapping from nd-gr(l-u7H) to ed-gr(ti;), cf. Lemma |22. 
By the corollary above, and Lemma |18|, m' G REL o MSOSnd ^ REL o 
MSOS. Consequently, as m equals the 'marking' from w to hifH followed by 
m', m G MREL o MSOS. 

For the reverse inclusion, NMSOS ^ MREL o MSOS, note that every 
marked relabelling can be decomposed into a marking and a relabelling, 
each of which we will show to be a (nondeterministic) mso transduction. 
The inclusion then follows from the closure of NMSOS under composition. 

The marking mapping w to hwH is easily seen to be an element of MSOS, 
either by direct construction, or by constructing a 2dgsm for that task, and 



applying Theorem ^ 



Finally, to show that REL C NMSOS one closely follows the argu- 
mentation in the proof of grREL C grNMSO, Theorem ^ As we rela- 
bel edges, rather than nodes, in the representation ed-gr(w) of a string w, 
but still have parameters ranging over nodes, we use the parameters for 
the source node of an edge to determine the new label of its outgoing edge 
(cf. Example ^Sj): v^dom is as before, but we now have ipl = true, and 
yjM = edge(x, y) A{x e Xr). □ 
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For completeness we note that the above result cannot be strengthened 
to NMSOS = REL o MSOS, as the relations on the right side are functional 
for the empty string A. This is not necessarily true for NMSOS. 

31 Example. The string transduction {(A, a), (A, b)} in a* x {a, b}* is real- 
ized by the following nondeterministic mso transduction, in the edge repre- 
sentation. The single parameter X determines whether A is mapped to a or 
to b. Let 
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n 

Combining the previous two results (that relate the nondeterministic and 
deterministic mso transductions) with the equalities between deterministic 
mso transductions and deterministic gsm mappings of Theorem ^, we di- 
rectly obtain the following result. 

32 Theorem. 

NMSOS = MREL o 2DGSM and NMSOSnd = REL o 2DGSMA. 
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6 Finite Visit Machines 



Rajlich I [Raj 75|| observes that 2gsm are more powerful than 2dgsni (as gen- 



erative devices, by considering their output languages, i.e., the ranges of the 
transductions). He demonstrates that this is mainly due to the ability of 
the 2gsm to visit each of the positions of its input an unbounded number of 
times. 

Motivated by this result, we consider transducers that have a fixed bound 
on the number of times they visit each of their input positions -we call 
this the finite visit property- and relate these to the (nondeterministic) mso 
transductions. 

We show that the nondeterministic mso definable string transductions are 
exactly those transductions that are realized by the composition of two 2gsm 
with the finite visit property. Note that one direction of this result follows 



from Theorem 32 



Moreover, we characterize the nondeterministic mso definable string trans- 
ductions as those compositions of 2gsm's that realize finitary transductions, 
i.e., transductions that define a finite number of images for every input string. 

A more direct characterization can be obtained by considering 2gsm that 
are allowed to rewrite the symbols on their input tape (but with the fi- 
nite visit property). These machines exactly match the mso definable string 
transductions, both in the deterministic case and the nondeterministic case. 

The finite visit property was studied in, e.g., ||Hen65| , |Raj75| , pre784 
|Gre78b| , |Gre78c| , |ERS80| , [Eng82|] . 



6.1 Finite visit two-way generalized sequential machines 

A computation of a 2gsm is called k-visiting if each of the positions of the 
input tape is visited at most k times. The 2gsm 7W is called finite visit, 
if there is a constant k such that, for each pair {w, z) in the transduction 
realized by M., there exists a /c- visiting computation for {w, z). The family of 
string transductions realized by finite visit nondeterministic 2gsm is denoted 
by 2NGSMfi„. 

Note that our definition is rather weak, as the machine may have many 
computations that are not A;- visiting, either without any chance of reaching 
the final state, or with loops in the computation that produce no output. 

If a deterministic 2gsm visits a position of the input tape twice in the same 
state, then the computation will enter an infinite loop that will not reach the 
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final state. This implies the well-known fact that every deterministic 2gsm 
is finite visit, where we choose for k the number of states of the machine. A 
similar argument enables us to prove the following characterization of finite 
visit transductions in terms of transductions that map each input string into 
a finite number of output strings. 



en 



33 Lemma. Let m be a string transduction. Th 
m e 2NGSMfin iffmE 2NGSM and m is finitary. 

Proof. Clearly, the length of the output of a fc-visiting computation on 
input w is at most k times the length of \-w-\. Hence the implication from 
left to right. 

As for the other implication, assume that the finitary transduction m is 
realized by a 2gsm A4. If during a (successful) computation for {w,z) G 
m, Ai visits the same position twice in the same state, then it did not 
write symbols to the output in the meantime, because otherwise A4 has 
infinitely many output strings for the present input, as an easy pumping 
argument shows. Hence we may omit this excursion from the computation. 
Consequently, there is a computation of M. for {w, z) that does not visit each 
of the tape positions more than k times, where k is the number of states of 
M.. Hence M. itself is finite visit. □ 

It is well known (see, e.g., |Fii69i |ChJa77| , |Gre784 |Gre78b| , |AhU170|| ) 



that the computation of a finite visit 2gsm on an input tape can be coded 

as a string of 'visiting sequences' (strongly related to 'crossing sequences', cf. 

Rab63| , [Hen65| , |HoU179| , p3ir96|| ). We recall how this can be done, without 



going into details. 

We consider several types of visits during a computation, differing in the 
direction (—1, 0, or +1) of the steps taken by the machine just before and 
just after the visit. Additionally, a visit may be either the first or the last 
visit of the computation. 

Given a computation of a 2gsm, the visiting sequence of a position of the 
input tape is the sequence that starts with the symbol a on the tape, followed 
by the consecutive visits of the machine to that position. Each of the visits 
is given as a 4-tuple (~e,p, "^s, a) consisting of the direction ~e of the move 
before the visit, the state p during the visit, the direction +e of the move after 
the visit, and the string a written to the output during that move. For the 
first visit we take ~e = *, for the last visit we take "^e = *. 

We illustrate this notion with an example. 
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Figure 6: Visiting sequences for Example |], cf. Example 34 



34 Example. Consider the 2dgsm from Example |^. Each of the visiting 
sequences during a successful computation is one of the following. 

(h, (*,0,+l,A),(-l,3,+l,A)) 

(h, (*,0,+l,A)) 

(a, (+l,l,+l,a),(-l,3,-l,6),(+l,4,+l,A)) 

(a, (+l,l,+l,a)) 

( 6, (+1, 1, 0, A), (0, 2, -1, A), (+1, 4, +1, A), (-1, 3, +1, A) ) 

(6, (+1,1,0,A),(0,2,-1,A),(+1,4,+1,A)) 

(H, (+1,1,0, A), (0,2,0, A), (0,5,*, A)) 

These visiting sequences are depicted in a suitable graphical manner in Fig- 
ure H, cf. Figure |I[ □ 

Each visiting sequence must satisfy some syntactical constraints. 

First, the directions of the visits are 'alternating'. This means that the 
first visit enters from the left (~e = +1, with the exception for a = \- which 
starts in the initial state with ~e = *); then, if the move after the i-th visit 
equals "'e = — 1,0, +1, then the move prior to the z + l-st visit to the same 
position must equal ~e' = +1,0,-1, respectively. Only the last visit of a 
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sequence can have +e = *, in case the state is final, signalhng the end of a 
computation. 

Secondly, the direction +e of the move after the visit, and the string a 
written to the output, must correspond to an instruction of the machine for 
the given input symbol a and the given state p. Additionally, when "^e = 0, 
the new state given by the instruction must match the next visit of the 
visiting sequence. 

Clearly, also neighbouring visiting sequences for a given computation 
must satisfy several constraints. If a visiting sequence has k 'crossings' to 
the right, either outgoing visits (~e,p, +1, a) or incoming visits (— l,p, "^e, a) 
-they alternate- then the visiting sequence to the right has exactly k match- 
ing crossings to the left, matching both in direction (which implicitly follows 
from the restrictions on single visiting sequences above) and in state change 
for the machine. Note that a visit (— l,p, +1, a) represents two crossings. 

Finally, the first visiting sequence of a computation should start with a 
visit (*, gj„, +e, a), and exactly one visiting sequence should end with a visit 
(~e,g/,*,a). 

When we bound the number of visits to each position, the visiting se- 
quences come from a finite set, and we can interprete these sequences as 
symbols from a finite alphabet. Each A;- visiting computation is specified by 
a string over this alphabet, and we will call these strings k-tracks. (E.g., the 
track in Figure ^ specifies the computation of the 2dgsm of Example § on 
input a^b'^aba, cf. Figure |ip. It should be obvious from the above remarks 
that the language of such specifications is regular (see, e.g.. Lemma 2.2 of 
Gre78a]| , or Lemma 1 of ||ChJa77|| ). For instance, it is the heart of the proof 



in [|HoU179| , Theorem 2.5] of the result that two-way finite state automata 



are equivalent to their one-way counterparts ||RaSc59|, phe59 



35 Proposition. Let M. he a 2gsm, and let k he a constant. The k-tracks 
for successful k-visiting computations of Ai form a regular language. 



From this result, using standard techniques (see e.g., ||UhJa77|, Lemma 1]) 



we obtain the following decomposition of finite visit nondeterministic 2gsm 
transductions. Note that this decomposition already features in Theorem ^ 
as characterization of NMSOS. 

36 Lemma. 2NGSMfin C MREL o 2DGSM = NMSOS. 
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Figure 7: Track for \-a?lP'aba-\, Example 



Proof. Let A^ be a 2gsm, finite visit for constant fc; each pair (w, z) in the 
transduction reahzed by M. can be computed by a fc-visiting computation. 

We may decompose the behaviour of M. on input w as follows. First, a 
relabelling of hwH guesses a string of fc-visiting sequences, one for each posi- 
tion of the input tape. Then, a 2dgsm verifies in a left to right scan whether 
the string specifies a valid computation, a track, of M. for w, cf. Proposi- 
tion ^ If this is the case, the 2dgsm returns to the left tape marker h and 
simulates M. on this input, following the fc- visiting computation previously 
guessed. 

When changing from one tape position to a neighbouring position, the 
2dgsm records the 'crossing number' of that move, i.e., the number of times 
it crossed the border between these two tape positions (in one direction or 
another). The crossing number can be read by inspecting the directions of the 
moves stored in the visiting sequence. It is used to 'enter' the next visiting 
sequence at the right visit, cf. Figure |^. □ 

37 Theorem. NMSOS = 2NGSMfi„ o 2NGSMfin. 

Proof. By the last lemma, 2NGSMfin C NMSOS. As the right-hand side 
of this inclusion is closed under composition (Proposition B^ we have the 
inclusion 2NGSMfin ° 2NGSMfin C NMSOS. 

According to Theorem ||, NMSOS equals MREL o 2DGSM. The inclu- 
sion from left to right follows from the fact that both MREL C 2NGSMfin 
and 2DGSM C 2NGSMfin. □ 
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It is instructive to note that this characterization imphes the (apparently 
new) result that 2NGSMfin o 2NGSMfin is closed under composition. This 
should be contrasted to the fact that 2NGSMfin itself is not closed under 
composition. This follows from the observation from the preceding section, 
that the relation m from Example ^ does not belong to 2NGSM D 2NGSMfin 
(Lemma |26|). As we have observed, it can be realized as combination of two 
2gsm's, the first one nondeterministically changing a string a" to a string 
w G {a,b}* with \w\ = n, the second one duphcating w into w^w. Both of 
these 2gsm's are finite visit. (Alternatively, by Example ^, m G NMSOS 
which equals 2NGSM|j^ as we just have seen.) 

The famihes 2DGSM, 2NGSMfin, and 2NGSM|^ form a hierarchy of trans- 
ductions. However, as far as their output languages are concerned (ranges, 
or equivalently, with regular input) these three families are equally powerful 
^ie7|,|Gfe7Sg. 



Recall that the families 2NGSM and NMSOS are incomparable, see the 



discussion preceding Lemma |2y. We have a surprising characterization for 
their intersection. 

38 Theorem. 2NGSM n NMSOS = 2NGSMfin. 

Proof. Obviously 2NGSMfin C 2NGSM, while 2NGSMfin C NMSOS by 



Theorem 37, which proves the inclusion from right to left. 



The reverse implication is immediate from Lemma |53|: recall that trans 



ductions in NMSOS are finitary because the number of parameter valuations 
is finite. □ 



Combining this theorem and the related Lemma ^ we obtain that a 
2gsm string transduction is mso definable if and only if it is finitary. This 
generalizes a similar result of Courcelle [ pou94| . Proposition 6.1] for rational 



transductions (i.e., string transductions realized by 2gsm never moving to 
the left). It can be extended to arbitrary compositions of two-way gsm's, as 
we shall see in our next main result. Theorem ^. 

As a preparation to this result (and its proof) we like to point out that 
'pumping' computations for finite visit transductions (iterating suitable seg- 
ments of tracks) does not only result in duplication of parts of the output, 
but may also rearrange neighbouring segments of the output. We illustrate 
this with an example. 
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39 Example. The 2gsni A4 has states 1 to 6, initial state 1, final state 
6, and transitions (p, a, q, a, e, p, A, 0) where the move q, a, e for each pair 
p G {1, 2, . . . , 5}, 0" G {h, a, b, H} is given in the following matrix. 





1 


2 3 


4 


5 


h 


1,A,+1 


3,6,0 3,A,+1 


5,6,0 


5,A,+1 


a 


l,a,+l 


2,a, -1 3,a, +1 


4,a,-l 


5,a, +1 


b 


2,A,-1 


4,A,-1 1,A,+1 


5,A,+1 


3,A,+1 


H 


2,c,0 


2,A, -1 4,c,0 


4,A,-1 


6,A,0 



(Note that the machine is nondeterministic in our setting, but is obtained 
by adding dummy alternatives to a deterministic automaton in the 5-tuple 
framework, see Section 0.) 

On each segment of a's of the input 7V1 makes five passes in states 1 to 
5, each in alternate directions, while copying the letters to the output. 

On a letter 6 the machine does not generate output, but it performs a 
permutation of the order in which the two neighbouring segments of a's are 
read. This is best explained by looking at the computations on the input 
strings a^6'a^, i = 0, 1, 2 as depicted in Figure |^. The output strings for these 
inputs are given in the following table. 



input string 



output string 



a?b^a 



2 

a^'b'a'' 

a^Va^. i > 2 



a^ca^ba^ca^ba^ = a^{a?'ca'^){a?ba^){a^ca^){a?ba^')a^ 
a%a^ca^ba^ca'^ = a^{a%a^){a'^ca'^){a^ba^){a'^ca?)a? 



a^ba^ba^ca'^ca^ 



As we have seen, the introduction of the symbol 6 in the input does not 
generate new output. Instead, it rearranges the parts of the computation 
that extend to both sides of the symbol. 

Consider the boundary between two tape positions, where we want to 
insert a symbol 6. Let xi, zi, X2, 2^2, 2:3, z^ be the strings written to the output 
during the consecutive parts of the computation that visit the left (x,) and 
right {zi) segments of the tape, see Figure ||. The output generated is thus 

XiZiX2Z2X'iZ^. 

Now, we introduce 6 at the selected boundary, and obtain the new output 
xiX2ZiXzZ2Z^. This rearrangement of the output can be formalized by the 
application of the substitution a^ : [21,^2,-23 *— A,zi,Z2-Z3] - where Zi is a 
formal parameter rather than a specific string. 
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\- a a a a a 



\- a a a h a a 




\-aaahhaa-\ 



a b b b a 





Figure 8: Computations for Example |39| 
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The effect of introducing bb can be computed by the composition abCb : 
[^1,2:2,-23 <— A, A, ^1^2-23], which defines the rearrangement X1X2X3Z1Z2Z3 of 
the output. Note that al = al for i > 2. □ 



40 Lemma. Let m be a finitary string transduction, and let X be a family 

of string transductions. 

IfmeX o 2NGSM o 2DGSM, then meX o MREL o 2DGSM. 

Proof. Assume that the finitary transduction ?7i is a composition m = 
mo ° rrii o rn2 as in the statement of the lemma; niQ G X, mi reahzed by 
the 2gsm A^i, and m2 reahzed by the 2dgsm J^2- As to be expected, the 
unknown family X will not feature in our arguments, but later will enable 
us to apply the result in a context. In fact, we show how to replace rrii o m2 
by nil ° ''^2 € MREL o 2DGSM such that niQ o mi o 77^2 = ttiq ° ttli o m2- 
Hence mi o rn2 equals rfii ° 1712 on the range of mo. 



Reconsider the proof of Lemma 36, where a fc- visit 2gsm is decomposed 



into a relabelling that guesses a fc-visiting sequence for each position of the 
input tape, and a 2dgsm that verifies in a single left-to-right pass whether 
the resulting string defines a fc-track, and then deterministically simulates 
the specified computation for the original input. Alternatively, by combining 
the verification phase with the relabelling, we may decompose the /c-visit 
2gsm into a one-way gsm that nondeterministically writes a fc-track, and a 
2dgsm simulating the computation. 

We apply that new decomposition to A^2, and immediately observe that 
the first phase (guessing and writing a track) can be performed by M.i using 
a straightforward direct product construction. 

Summarizing: we have replaced the composition mi o m2 by a new com- 
position m'l o m'2 reahzed by J^'i followed by A^2, where A^'^ is a 2gsm that 
writes valid tracks for the 2dgsm TVlg- Let 7VI2 be fc-visit. 

We continue by demonstrating that we need not consider all computations 
of A^i, instead it suffices to put a bound on the number of visits that the 
machine makes to each of the positions of its input. This will change the 
transduction m'l realized by A^^, but not the composition mo ° rn'i o m^ (due 
to m being finitary). 

Consider the behaviour of M'l on input w, where w is in the range of mo- 
Fix a position on the tape \-w-\ and a state of A^^, and split the output of 
M'l during the computation into segments, corresponding to the consecutive 
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Zl 



Z2 



z-i 



Figure 9: Visualization of rearrangements 
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visits to the selected position in the selected state. Ai[ writes xyiy2 ■ ■ ■ ytz 
where yi is written during the excursions in between consecutive visits. We 
assume t > 1. 

Returning to the same position and state, each of the excursions can be 
repeated in (or omitted from) the computation of M'l-, so the machine may 
produce every string xyz, y G {yi,y2, ■ ■ ■ ,yt}* as possible output on input 
w. By our previous construction, each output of Ai[ forms a /c-track for the 
second machine AI2. This implies that AI2 does not generate output during 
any of its visits to the segments yi, as m is supposed to be finitary. 

At first glance, the excursion of A4[ writing y = yi ■ ■ ■ yt can be omitted: 
the second machine TVf 2 does not generate output when it visits the segment 
y during its simulation of the specified computation. However, the previous 
example shows that y (or in fact any segment y^) may have its effect on the 
output of A^2 by rearranging parts of the adjacent computation that leave 
the segment y (to the left or to the right) in order to return there later. 

We consider the computation of 7W2 specified by the track xyz from the 
viewpoint of the segment y. Starting from the leftmost position of x, the 
computation enters y from the left. Before leaving the segment for the last 
time, the computation makes several tours outside y. 

Such a tour of M.2 to the left of the segment y, in x, corresponds to two 
consecutive visits (~e,p, —1, A) and {+l,p', V, A) in the first visiting sequence 
of y, meaning the computation leaves the segment to the left in state p, 
returning there later in state p'. A symmetric observation holds for tours to 
the right, in z, and consecutive visits in the last visiting sequence of y. 

Hence, the relative order of those tours that leave to the left is fixed 
by the last visiting sequence of x, similarly for the tours to the right. The 
relative order of all tours (left and right taken together) is determined by the 
segment y. Replacing y by another string in {yi, 1/2, ... , yt}* will not change 
the tours in x and z, but it may rearrange the relative order of tours to the 
left and tours to the right. 

A visiting sequence for A^2 contains at most k visits. Hence, there are 
less than k tours to each side of the segment. Together these at most 2k 
tours may be ordered in less than k = ( ^j ways (the orders of the tours at 
the same side of the segment are fixed). 

Now we are able to apply a pumping argument to the segment y = 
yi---yt. If t > K, then two of the prefixes yi---yi^, yi-'-yi^, k < h, 
define the same rearrangement on the adjacent tours, and thus we may re- 
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place Hi ■ ■ ■yi2 by yi ■ ■ -i/i-^ in the output xyz of Ai[. The resulting track 
^Vi ■ ■ ■ l/iil/j2+i ■ ■ ■ Vt^ defines a computation for J\4'2 that results in the same 
output as the original track xyz. Thus, we may assume that t < k. 

Consequently, we allow for all possible rearrangements, and hence for all 
possible outputs of A^2, by taking k as the bound on the number of visits of 
7W^ to a fixed position in a fixed state. 

Now that we have limited the number of visits of A4[ to k times the size 
of its state set, we can replace J^[ by a decomposition in MREL o 2DGSM, 
using again the argumentation of Lemma ^ Thus, m[ o rn^ is replaced by a 
composition in (MREL o 2DGSM) o 2DGSM. The result follows, as 2DGSM 
is closed under composition. Proposition ^. □ 

The variable family X in the previous result allows us to apply the lemma 
in the context of an arbitrary sequence of 2gsm transductions. 

41 Theorem. Let m he a string transduction, and let k > 1. 
If me 2NGSM^ and m is finitary, then m e MREL o 2DGSM. 

Proof. Observe that 2NGSM o MREL C 2NGSM by an obvious construc- 
tion. 

Let A; > 1. Assume that m e 2NGSM'' o 2DGSM is finitary. We have 
by the previous lemma, m E 2NGSM'''"^ o MREL o 2DGSM, which equals 
2NGSM'="^ o 2DGSM for A; > 1 (and which equals MREL o 2DGSM for 
k = l). 

Hence, by induction on k, m e 2NGSM*= o 2DGSM implies m G MREL o 
2DGSM, for a finitary string transduction m. As 2NGSM^ o 2DGSM D 
2NGSM^ the theorem follows. □ 

42 Theorem. Let m be a string transduction. Then 
m G NMSOS zjfme Ufc>i 2NGSM'^ and m zs finitary. 

Proof. By Theorem ^ NMSOS = 2NGSMi^ C 2NGSM2. Additionally, 
elements of NMSOS are necessarily finitary. This proves the implication 
from left to right. The reverse implication follows from the last result and 
the characterization NMSOS = MREL o 2DGSM from Theorem M. D 
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It is shown in [ h]ng82 , Theorem 4.9] that every functional transduction in 
Ufc>i 2NGSM'= is in 2DGSM. Together with Theorem |2| (MSOS = 2DGSM) 



this gives the following counterpart of Theorem H2 



43 Theorem. Let m be a string transduction. Then 
m e MSOS iff me Ufc>i 2NGSM'' and m is functional. 



A Venn diagram is given in Figure |TU|, page B^. It illustrates the results 



from Lemma 33, and Theorems 38, 42, and 43. 



6.2 Hennie machines 

Extending a finite visit 2gsm with the possibility to rewrite the contents of 
the cell of the input tape that it is visiting, we obtain the Hennie machine, 
introduced in |[Hen65|| as an accepting device, and considered as transducer 
|Raj75|| (under the name 'bounded crossing transducer'). Alternatively, a 



m 



Hennie machine is a linear bounded automaton (as transducer, so equipped 
with a one-way output tape) that is finite visit. We find it, somewhat dis- 



guised, in [|Gre78b|] as 'one way finite visit preset Turing machine', where the 



'preset working tape' should be interpreted as input tape, and the 'one way 
input tape' as output tape. 

It should be clear how to extend our basic 2sm model to allow for writing 
on the input tape, thus we will refrain from giving the full 10-tuple formaliza- 
tion. The families of string transductions realized by nondeterministic and 
deterministic Hennie machines are denoted by NHM and DHM, respectively. 

44 Example. Once again consider our running nondeterministic example 



(cf. Example 25) 



m 



{ {aJ'.w^w) I n > 0,w G {a, 6}' 



\w\ 



n }. 



It can be realized by a Hennie machine moving in two consecutive left-to- 
right passes over the input. First it nondeterministically rewrites the input 
a" into a string w with \w\ = n, while writing this string to the output tape, 
then it writes w again to the output, copying it from the rewritten input 
tape. Obviously, the machine is 3- visit. □ 



45 Theorem. NMSOS = NHM. 
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Proof. In view of Theorem |3^ it suffices to prove the equahty NHM = 
MREL o 2DGSM. 

The inclusion of NHM in MREL o 2DGSM can be proved as Lemma |3^, 
which states this inclusion for 2NGSMfin: the relabelling guesses a string of 
visiting sequences for the computation of the Hennie machine on the input 
string; the 2dgsm verifies that this string is a track and simulates the compu- 
tation. Note that a visiting sequence of a Hennie machine should also record 
the symbol at the position of the input tape at each visit. It is straigthfor- 
ward to adapt the notions of visiting sequence and A;-track in this way, such 
that Proposition |35| still holds (see ||Gre78a) , pre78b| , [Bir96|| ). 



The reverse inclusion is almost immediate. In two phases the Hennie 
machine may simulate the composition, first writing the image of the marked 
relabelling on the tape, and then simulating the 2dgsm on this new tape. 
There is a minor technicality: for a given input w the initial tape contains 
hiyH, and the Hennie machine is supposed to overwrite this string with its 
relabelling and add two new tape markers (for the simulation of the 2dgsm). 
Instead, it keeps the relabelling of the tape markers in its finite state memory, 
rather than overwriting them. □ 

Restating the above result as NHM = MREL o 2DGSM, it generahzes 
the result of Rajlich ||Ra j 75| , Theorem 2.1] that the output languages of non- 



deterministic Hennie machines equal the output languages of two-way deter- 
ministic generalized sequential machines, see also |pre784 Thm 2.15(2)]. 



The above demonstration of the inclusion MREL o 2DGSM C NHM can 
easily be extended to a proof of NHM o NHM C NHM. A Hennie machine 
can simulate the composition of two of its colleagues by writing the visiting 
sequences of the first machine onto the input tape. The output tape is 
contained in this string, conveniently folded over the input tape, ready to be 
used by the second machine. 

We have, however, the closure of NHM under composition for free as a 
consequence of the above characterization and Proposition ^ 

46 Corollary. NHM is closed under composition. 

In [ PhJa77|l it is noted that the inclusion DHM o 2DGSM C 2DGSM 
can be proved analogously to their result that 2DGSM o 2DGSM C 2DGSM 
(i.e., 2DGSM is closed under composition. Proposition ^). That of course 
implies the equality of the families of transductions realized by deterministic 
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Hennie machines and those reahsed by deterministic 2gsm. This equahty is 
rephrased as follows. 

47 Theorem. MSOS = DHM. 



Proof. In view of Theorem |23| it suffices to prove the equality DHM = 
2DGSM. The inclusion DHM D 2DGSM is immediate. We demonstrate 
the reverse inclusion, much along the lines as sketched in [ phJ a77|| , see also 
I Eng82| , Theorem 4.9]. 



By Theorem |S|, NHM = MREL o 2DGSM. Hence, any Hennie trans- 
duction mn can be decomposed into a marked relabelling p and a determin- 
istic 2gsm transduction m2. We will argue that for a deterministic Hennie 
transduction this (nondeterministic) marked relabelling can be realized by a 
deterministic 2gsm, which shows DHM C 2DGSM by the closure of 2DGSM 
under composition. 

Let rriH be a deterministic Hennie transduction, and let rriH = p ° m2 be 
the decomposition as above. Let w be an input string. As mn is functional, 
'f^niu!) = 1712(10') for any marked relabelling w' G p{w) that belongs to 
the domain of m2- As this domain dom(m2) is a regular language ||RaSc59|, 



^he59|| , a 2dgsm-rla can be constructed that finds and outputs such a marked 



relabelling by one pass from left to right over the input, using its look- 
around to check the remainder of the input for a relabelling of the present 
input symbol that leads to an element of dom(m2). This means that the 
2dgsm-rla looks ahead to test the suffix of the tape for membership in the 
language p^^{L{Aq)), where Aq is a (fixed) one-way deterministic finite state 
automaton accepting dom(m2) except that the initial state is changed to q 
which is the state where A would be after reading the output generated by 
the 2dgsm-rla on the prefix, including the relabelling chosen for the present 
symbol. □ 

Finale. In this section we have obtained a rather precize characterization 
of mso definable string transductions in terms of Hennie transductions, both 
in the deterministic and in the nondeterministic case. Intuitively an impor- 
tant reason for this equivalence is the inherent boundedness of both types of 
transductions: mso definable transductions have a bound on the number of 
copies, whereas Hennie machines have a bound on the number of visits to 
each of the tape positions. 
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In case of determinism these two families are equal to the family of trans- 



ductions realized by two-way generalized sequential machines, Theorem |23. 
This should be contrasted to nondeterministic transductions, where 2gsm 
are unable to record choices made during the computation, whereas Hennie 
machines may use their tape for this purpose. 
We summarize. 

48 Theorem. 

1. MSOS = DHM = 2DGSM. 

2. NMSOS = NHM = MREL o 2DGSM = 2NGSMg^. 

Now that the families NMSOS and 2NGSM have shown to be incompara- 
ble, unlike their deterministic counterparts, one may look for natural variants 
of the families that have the same power. For machines we have discussed 
such a variant. Indeed, by extending the model with the power of rewriting 
its input tape (and at the same time demanding the finite visit property) 
we obtain the Hennie transductions. We leave it as an open problem how 
to introduce a variant of nondeterminism for mso definable transductions 
that corresponds to 2ngsm. Additionally, we did not consider transductions 
realized by one-way transducers. Another remaining problem of interest is 
the power of first-order logic to define string transductions (where, in Defini- 
tion [1^, we assume all formulas to be first-order, see Example 0). Note that 



even for C = {1} there are first-order definable string transductions that 
cannot be realized by one-way transducers (such as transforming a string 
into its reversal). The class of first-order definable string transductions (with 
respect to nd-gr) such that C = {1} and 0^'^(x, y) = edge^(x, y) is charac- 
terized in ||LMSV|| to be the class of all transductions that can be realized by 



functional aperiodic nondeterministic one-way sequential machines (where a 
sequential machine is a gsm that outputs exactly one symbol at each step). 
The equivalence of aperiodic finite state automata and first-order logic was 



estabhshed in fMNPaTl 
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2NGSM'= 






NMSOS = 2NGSM2^, 




2NGSM 


2NGSMfi„ 






MSOS = 2DGSM 



finitary 



functional 



Figure 10: Relationships between our main families of transductions 
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